Overview

Microbes contribute to aquatic ecosystem function and the fitness of macroscopic organisms, including zooplankton. Many factors affect the taxonomic compositions of free-living (bacterioplankton) and zooplankton-associated microbial communities in lakes, yet how these communities vary seasonally and among lakes remains poorly understood. Here we investigate how free-living bacterial communities and those associated with different crustacean zooplankton hosts change in response to fluctuations in their natural environment across time and space. Here we use repeated sampling of bacterioplankton, zooplankton communities, zooplankton microbiomes (16S-rRNA), and water chemistry parameters of six lakes in the eastern Sierra Nevada mountains of California across a summer season. We tested the influence of environmental conditions and spatial proximity as drivers of free-living and host-associated zooplankton. In addition, we tested whether microbiomes of zooplankton were distinct or similar among taxonomically diverse hosts and whether space or host identify best structured these communities.

knitr::opts_chunk$set(warning=FALSE, message=FALSE, fig.align='center', cache=TRUE)

# load packages
if (!require("pacman")) install.packages("pacman") # for rapid install if not in library


devtools::install_github("benjjneb/dada2", ref="v1.20") # update to most recent dada2
remotes::install_github("microbiome/microbiome")
remotes::install_github("kstagaman/phyloseqCompanion")

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

if (!require("phyloseq", quietly = TRUE))
    BiocManager::install("phyloseq",  quietly = TRUE)

if (!require("dada2", quietly = TRUE))
    BiocManager::install("dada2", quietly = TRUE)

# use pacman to load CRAN packages missing
pacman::p_load('knitr', 'microbiome', 'phyloseq', 'tidyr', 'tidyverse', 'knitr', 'magrittr', 'effects', 'devtools', 'stringi', 'dplyr', "ggplot2", "gridExtra", "dada2", "phyloseq", "vegan", "cowplot", 'doBy', 'ecodist', 'phyloseqCompanion', 'pairwiseAdonis', 'glue', 'geosphere', 'data.table', 'patchwork', 'car', 'ggcorrplot', 'FactoMineR', 'devtools', 'emmeans', 'rmarkdown', 'reshape', 'lattice',  'plyr', 'magrittr', 'factoextra', 'multcompView', 'decontam', 'factoextra', 'car', 'mia', 'ade4', 'fossil', 'picante', 'reshape', 'readr', 'corrr', 'Hmisc', "MASS", "FSA", "sciplot", "decontam","BiocManager", 'ggpubr', 'ggmap', "ggordiplots", "fossil", "igraph", "huge", "leaflet", "ggtree", "colorBlindness", "RColorBrewer","MicrobiotaProcess", "predict3d", "ggiraphExtra", "data.table", "betapart", "naniar", "rstatix")

Site Map

The map of lakes and their proximity can be seen below. We sampled zooplankton, water, and biogeochemical variables in lakes in the Eastern Sierra Nevada mountains of California, USA: Eastern Brook, Serene, Convict, Cooney, Blue, and Virginia.

Figure. 1. Sierra Nevada Lakes sampled for zooplankton microbiomes and environmental conditons across a summer season.
Figure. 1. Sierra Nevada Lakes sampled for zooplankton microbiomes and environmental conditons across a summer season.

Environmental Data

Load in the environmental field data and create plots for environmental conditions and water nutrients/chemistry across all lakes and 6 sampling points. - first remove outliers, check data structure, and add some carpentry to the metadata data frame. - then load in nutrient and water chemistry data (collected in situ) - then load in air temperatures and precipitations data from sampling stations in Mammoth Lakes.

field.df<-read.csv("data/field_metadata.csv")
field.df$date.simple<-field.df$time_point
field.df$date.simple<-recode_factor(field.df$date.simple, 
                                "1" = "25-Jun",
                                "2" = "6-Jul",
                                "3" = "21-Jul",
                                "4" = "4-Aug",
                                "5" = "22-Aug")


str(field.df)
field.df$TDP..ug.L<-as.numeric(field.df$TDP..ug.L)
field.df$TDN..ug.L<-as.numeric(field.df$TDN..ug.L)

## replace outliers in nutrient analysis
field.df<-field.df %>% 
  naniar::replace_with_na(replace = list(TDP..ug.L = 4102.48))

# order the lakes by North to South or upstream in phyloseq 
field.df$lake <- factor(field.df$lake, levels = 
                                    c("Cooney", "Blue", "Virginia", "Convict", "Serene", "Eastern Brook"))
# make meters
field.df$depth..m<-field.df$depth..ft*0.3048

#fix str
field.df$lake<-as.factor(field.df$lake)
field.df$time_point<-as.factor(field.df$time_point)

# depth mean and SE
mean.depth.m<-aggregate(depth..m~lake, FUN=mean, data=field.df)
mean.depth.SE<-aggregate(depth..m~lake, function(x) sd(x) / sqrt(length(x)), data=field.df)

colnames(mean.depth.m)<-c("lake", "depth..m")
colnames(mean.depth.SE)<-c("lake", "depth..m.SE")
depth.df<-cbind(mean.depth.m[1:2], mean.depth.SE[2])
######## TDP

TP.plot<-ggplot(field.df, aes(x=date.simple, y=TDP..ug.L, group=lake, color=lake)) + 
  geom_point(size=2)+
  geom_line() + 
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  xlab("Sampling Time Point") + 
  ylab(expression(paste("TDP", ~(mu*g~L^-1), sep = "")))+ theme_classic() + 
   theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))


######## TDN

TN.plot<-ggplot(field.df, aes(x=date.simple, y=TDN..ug.L, group=lake, color=lake)) + 
  geom_point(size=2)+
  geom_line() + 
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  xlab("Sampling Time Point") + 
  ylab(expression(paste("TDN", ~(mu*g~L^-1), sep = ""))) + theme_classic() + 
   theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))


####### DOC

DOC.plot<-ggplot(field.df, aes(x=date.simple, y=DOC_values, group=lake, color=lake)) + 
  geom_line() +  
  geom_point(size = 2)+  
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  xlab("Sampling Time Point") + 
  ylab(expression(paste("DOC", ~(mg~L^-1), sep = ""))) + theme_classic() + 
   theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))


####### chlorophyll

chla.plot<- ggplot(field.df, aes(x= date.simple, y=chla..ug.L, group=lake, color=lake)) + 
  geom_line() +  
  geom_point(size = 2)+  
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  xlab("Sampling Time Point") + 
  coord_cartesian(ylim=c(0, 2.5))+
  theme(axis.text = element_text(size=7)) +  
  ylab(expression(paste("Chl-a", ~(mu*g~ L^-1), sep = "")))  + theme_classic() + 
   theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))


######## Dissolved oxygen

DO.percent.plot<-ggplot(field.df, aes(x= date.simple, y=DO..perc, group=lake, color=lake)) + 
  geom_line() +  
  geom_point(size = 2)+ 
  coord_cartesian(ylim=c(60, 90))+
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  xlab("Sampling Time Point") + ylab("DO (%)") +
  theme(text = element_text(size = 10)) + theme_classic() + 
   theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))


######### Conductivity

conductivity.plot<-ggplot(field.df, aes(x= date.simple,  y=cond, group=lake, color=lake)) +
  geom_line() +  
  geom_point(size = 2)+ 
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  xlab("Sampling Time Point") +
  coord_cartesian(ylim=c(0, 180))+
  ylab(expression(paste("Conductivity", ~(mu*S~ cm^-1), sep = ""))) + theme_classic() + 
   theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))


###### Specific conductivity

spc.plot <- ggplot(field.df, aes(x= date.simple,  y=spc, group=lake, color=lake)) + 
  geom_line() +  
  geom_point(size = 2)+ 
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  xlab("Sampling Time Point") +
  coord_cartesian(ylim=c(0, 200))+
  ylab(expression(paste("SPC", ~(mu*S~ cm^-1)))) + theme_classic() +  
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))


######## pH

pH.plot<-ggplot(field.df, aes(x= date.simple,  y=pH, group=lake, color=lake)) + 
  geom_line() +  
  geom_point(size = 2)+ 
  coord_cartesian(ylim=c(5, 10))+
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  xlab("Sampling Time Point") + theme_classic() +  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

######## Surface temp

laketemp.plot<-ggplot(field.df, aes(x= date.simple,  y=temp..C, group=lake, color=lake)) + 
  geom_line() +  
  geom_point(size = 2)+ 
  coord_cartesian(ylim=c(12, 22))+
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4")) +
  xlab("Sampling Time Point") + 
  ylab("Temperature (°C)") + theme_classic() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))


## get legend
extract.legend.env <- get_legend(
  # create some space to the left of the legend
  laketemp.plot + theme(legend.box.margin = margin(0, 0, 0, 10)))


#### combine plot
env.plots<-plot_grid(laketemp.plot + guides(color="none"), DO.percent.plot + guides(color="none"), 
          pH.plot + guides(color="none"), conductivity.plot + guides(color="none"), 
          chla.plot + guides(color="none"), DOC.plot + guides(color="none"), 
          TN.plot + guides(color="none"), TP.plot + guides(color="none"),
          ncol=4, rel_widths = c(7,7,7,7,7,7,7,1), labels=c("A", "B", "C", "D", "E", "F", "G", "H"))

env.plots
dev.copy(pdf, "figures/Fig2.env.plots.pdf", height=7, width = 12)
## pdf 
##   3
dev.off()
## quartz_off_screen 
##                 2
**Figure 2.** Environmental and water chemistry metrics measured in each of the six lakes across five sampling time points.

Figure 2. Environmental and water chemistry metrics measured in each of the six lakes across five sampling time points.

# air temp and rainfall

weather <- read.csv('data/weather.SNARL.csv', header = TRUE)
weather$date <- as.Date(weather$date, format="%m/%d/%y")
coeff<-0.5

weather.plot <- ggplot(weather, aes(x = date)) +
  geom_line(aes(y = avg.temp.air.C, color = "avg.temp"), linewidth = 1, linetype = "solid") +
  geom_line(aes(y = rain.total.mm*coeff, color = "rain.total"), linewidth = 1, linetype = "solid") +
  labs(x = "Date", y = "Average Air Temperature (°C)", title = NULL) + 
  scale_y_continuous(sec.axis= sec_axis(~./coeff, name = "Total Daily Precipitation (mm)")) +
  scale_x_date(breaks = "2 week", date_labels = "%b %d") +
  scale_color_manual( values = c("rain.total" = "steelblue", "avg.temp" = "orangered"), breaks = c("avg.temp", "rain.total")
  ) + theme_classic() +
theme(axis.text.x = element_text(angle = 90))

weather.plot
dev.print(pdf, "figures/FigS1.Air.Rain.snarl.plot.pdf", width = 6, height = 6)
dev.off()
**Figure S1.**  Average air temperature and total daily precipitation across the summer season in which lake sampling occurred.

Figure S1. Average air temperature and total daily precipitation across the summer season in which lake sampling occurred.

Zooplankton Community

Using the zooplankton community data, we will make plots for the density of plankton at each lake and at the different sampling points, creating a dot-line plot for each taxa. We will then make a relative abundance stacked-density plot to show the change in zooplankton community composition.

Plankton density

Generate a dot-line plot of the zooplankton taxa density (# plankton/L). We will use the most abundant and common taxa for these calculations.

Test for differences in zooplankton densities across lakes and time points.
Note Interactions aren’t approproate for these models since there is only n=1 in each lake/time combination. We therefore use Type II SS of main effects.

# load in the density data
# density is the number of zoops per liter
counts <- read.csv('data/Zoop.community.csv')
counts$time_point <- as.factor(counts$time_point)
counts$lake <- as.factor(counts$lake)
str(counts)
## 'data.frame':    30 obs. of  42 variables:
##  $ time_point       : Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5 ...
##  $ lake             : Factor w/ 6 levels "Blue","Convict",..: 1 1 1 1 1 2 2 2 2 2 ...
##  $ latitude         : num  38.1 38.1 38.1 38.1 38.1 ...
##  $ longitude        : num  -119 -119 -119 -119 -119 ...
##  $ elevation_m      : int  3005 3005 3005 3005 3005 2315 2315 2315 2315 2315 ...
##  $ tow.depth.ft     : num  49.2 42.7 42.7 42.7 42.7 60 32.8 59 55.8 49.2 ...
##  $ volume..cubic.ft.: num  37.4 32.5 32.5 32.5 32.5 ...
##  $ volume..L.       : num  1060 920 920 920 920 ...
##  $ split            : int  4 NA 2 3 1 4 1 3 1 2 ...
##  $ perc.count       : num  0.0625 1 0.25 0.125 0.5 0.0625 0.5 0.125 0.5 0.25 ...
##  $ Daphnia          : int  828 110 372 369 376 46 127 561 741 528 ...
##  $ Calanoid         : int  0 2 4 75 61 173 812 521 37 132 ...
##  $ Cyclopoid        : int  794 241 445 189 192 27 76 289 355 771 ...
##  $ Nauplii          : int  6 108 131 151 239 502 828 297 10 28 ...
##  $ Asplancha        : int  66 27 110 79 127 1 0 7 7 0 ...
##  $ Holopedium       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Polyphemus       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Bosmina          : int  0 94 9 23 30 19 8 237 33 16 ...
##  $ Ceriodaphnia     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Keratella        : int  0 16 0 13 684 0 30 144 31 162 ...
##  $ Kellicotia       : int  0 0 0 90 37 0 12 0 0 83 ...
##  $ Chydoridae       : int  0 0 0 1 0 0 0 0 0 0 ...
##  $ daphnia.dens     : num  12.498 0.12 1.617 3.209 0.817 ...
##  $ cal.dens         : num  0 0.00217 0.01739 0.65219 0.13261 ...
##  $ cyc.dens         : num  11.985 0.262 1.935 1.644 0.417 ...
##  $ nau.dens         : num  0.0906 0.1174 0.5696 1.3131 0.5196 ...
##  $ asp.dens         : num  0.9962 0.0293 0.4783 0.687 0.2761 ...
##  $ holo.dens        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ poly.dens        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ bos.dens         : num  0 0.1022 0.0391 0.2 0.0652 ...
##  $ cerio.dens       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ ker.dens         : num  0 0.0174 0 0.113 1.487 ...
##  $ kell.dens        : num  0 0 0 0.7826 0.0804 ...
##  $ chy.dens         : num  0 0 0 0.0087 0 ...
##  $ temp..C          : num  12.8 14.8 17 16.3 18.5 ...
##  $ DO..perc         : num  73.8 82.1 81.3 75.6 73.1 81.6 84.5 76.7 78.9 73.1 ...
##  $ DO..mg.L         : num  7.92 8.24 7.81 7.41 7.33 8.45 8.4 7.16 7.41 7.35 ...
##  $ spc              : num  97.5 98.7 100.2 110.7 115.1 ...
##  $ cond             : num  74.7 79.4 84.9 92.3 93.6 ...
##  $ pH               : num  7.97 7.04 8.48 7.99 7.93 5.87 7.14 8.34 8.48 8.03 ...
##  $ DOC              : num  0.8 0.9 0.9 1.9 1.1 0.6 1.1 0.8 51.9 2 ...
##  $ chla_ug.L        : num  0.34 0.42 0.22 0.02 0.55 0.26 0.28 0.2 0.81 0.39 ...

counts$date.simple<-recode_factor(counts$time_point, 
                                "1" = "25-Jun",
                                "2" = "6-Jul",
                                "3" = "21-Jul",
                                "4" = "4-Aug",
                                "5" = "22-Aug")

# order the lakes by North to South or upstream in phyloseq 
counts$lake <- factor(counts$lake, levels = 
                                    c("Cooney", "Blue", "Virginia", "Convict", "Serene", "Eastern Brook"))

# make sample number (each sample is a different row)
counts$sample <- row.names(counts)


# make a new column at the end for total added density of all zoops at each time/lake combo?
counts_mod <- counts[,c(23:34)]%>%
  mutate(sum_of_rows = rowSums(.))
counts_mod
##    daphnia.dens   cal.dens   cyc.dens    nau.dens     asp.dens  holo.dens
## 1    12.4979441 0.00000000 11.9847435  0.09056481  0.996212935 0.00000000
## 2     0.1195689 0.00217398  0.2619646  0.11739490  0.029348726 0.00000000
## 3     1.6174409 0.01739184  1.9348419  0.56958268  0.478275531 0.00000000
## 4     3.2087940 0.65219391  1.6435286  1.31308373  0.686977581 0.00000000
## 5     0.8174164 0.13261276  0.4174041  0.51958114  0.276095420 0.00000000
## 6     0.5693508 2.14125405  0.3341842  6.21334989  0.012377191 0.00000000
## 7     0.3594291 2.29808211  0.2150914  2.34336452  0.000000000 0.00000000
## 8     3.5306462 3.27890670  1.8188177  1.86916563  0.044054409 0.00000000
## 9     1.2327283 0.06155323  0.5905783  0.01663601  0.011645207 0.00000000
## 10    1.9924259 0.49810647  2.9093946  0.10565895  0.000000000 0.00000000
## 11   14.2265231 0.20870205  0.2608776  3.13053075 19.844086580 0.00000000
## 12    5.0784165 0.08695919  0.5652347  0.20000613  8.565479965 0.00000000
## 13    3.4609757 0.17826633  0.9478551  0.30870511  1.817447018 0.00000000
## 14    1.1049941 0.33456113  0.2308943  0.23089430  0.000000000 0.00000000
## 15    3.5827185 0.00000000  0.3565327  0.66088983  0.043479594 0.00000000
## 16    3.1314293 1.28722787  6.8198322  6.88171820  0.000000000 0.14852629
## 17    4.3864221 1.02009816  2.7610657  4.01918675  0.176817014 0.07480720
## 18    1.4067154 1.10502133  1.2603313  0.41415985  0.000000000 0.09818445
## 19    1.2577966 0.06377560  1.3003137  2.63251515  0.000000000 0.00000000
## 20    4.2446207 0.00000000  1.3924340  0.86097064  0.007086178 0.00000000
## 21    0.5658144 3.92533772  0.8487217 10.07856983  5.587417658 0.00000000
## 22    0.1903187 0.55930403  0.1262318  0.53599969  0.007768112 0.00000000
## 23    1.0365897 1.94940759  0.0696217  0.04641447  0.000000000 0.75036720
## 24    1.0863823 1.45931947  0.0000000  0.11350262  0.000000000 1.05395295
## 25    2.4484138 2.69163368  0.0000000  0.32429321  0.000000000 0.28375656
## 26    7.9874139 0.09901753 15.5127461  9.50568270 18.219225170 0.00000000
## 27    2.1842102 0.10516568  3.5351846  3.17923926  3.583722627 0.00000000
## 28    6.5045472 0.56523472  3.2870573  2.78269400  0.095655106 0.00000000
## 29    0.2402248 0.13369975  0.6315411  0.10000307  0.140221690 0.00000000
## 30    2.2087634 0.00000000  3.3131450  0.04347959  0.034783675 0.00000000
##     poly.dens   bos.dens cerio.dens    ker.dens   kell.dens    chy.dens
## 1  0.00000000 0.00000000 0.00000000  0.00000000  0.00000000 0.000000000
## 2  0.00000000 0.10217704 0.00000000  0.01739184  0.00000000 0.000000000
## 3  0.00000000 0.03913163 0.00000000  0.00000000  0.00000000 0.000000000
## 4  0.00000000 0.20000613 0.00000000  0.11304694  0.78263269 0.008695919
## 5  0.00000000 0.06521939 0.00000000  1.48700211  0.08043725 0.000000000
## 6  0.00000000 0.23516663 0.00000000  0.00000000  0.00000000 0.000000000
## 7  0.00000000 0.02264120 0.00000000  0.08490451  0.03396180 0.000000000
## 8  0.00000000 1.49155641 0.00000000  0.90626212  0.00000000 0.000000000
## 9  0.00000000 0.05489883 0.00000000  0.05157163  0.00000000 0.000000000
## 10 0.00000000 0.06037654 0.00000000  0.61131248  0.31320331 0.000000000
## 11 0.00000000 0.92176739 0.00000000  0.05217551  0.12174286 0.000000000
## 12 0.00000000 0.89567963 0.00000000  0.06956735  0.06087143 0.000000000
## 13 0.00000000 0.71306534 0.00000000  0.02608776  0.23913777 0.004347959
## 14 0.00000000 0.03298490 0.00000000  0.16728056  0.20733366 0.000000000
## 15 0.00000000 1.75657559 0.00000000  6.96543091  5.35668595 0.008695919
## 16 0.00000000 5.02513955 0.00000000  0.18565787  0.25992101 0.000000000
## 17 0.03400327 0.04080393 0.00000000  0.08840851  0.34003272 0.013601309
## 18 0.00000000 0.02499240 0.00000000  0.00000000  0.00000000 0.000000000
## 19 0.00000000 0.02125853 0.09920649  0.01771544  0.70153163 0.031887801
## 20 0.00000000 0.39328288 0.00000000  0.00000000  0.13463738 0.017715445
## 21 0.00000000 0.10609021 0.35363403 53.99991623  0.00000000 0.070726806
## 22 0.00000000 0.03689853 0.32043460  0.34762299  0.00000000 0.001942028
## 23 0.04641447 0.17792212 2.13506545  0.30169403  0.03867872 0.015471489
## 24 0.00000000 0.12971729 4.54010500  0.17836127  0.21079059 0.000000000
## 25 0.00000000 0.00000000 3.11321486  0.36482987  0.00000000 0.000000000
## 26 0.00000000 5.74301663 0.00000000 14.88563506 13.66441888 0.000000000
## 27 0.00000000 1.39142278 0.00000000  3.91539899  4.53021371 0.000000000
## 28 0.00000000 0.03478368 0.12174286  0.26957348  0.93046331 0.000000000
## 29 0.00000000 0.01956582 0.00000000  0.15000460  0.00000000 0.000000000
## 30 0.00000000 0.07826327 0.62610615  0.36522859  0.18261429 0.017391837
##    sum_of_rows
## 1   25.5694653
## 2    0.6500199
## 3    4.6566645
## 4    8.6089596
## 5    3.7957685
## 6    9.5056827
## 7    5.3574747
## 8   12.9394092
## 9    2.0196115
## 10   6.4904782
## 11  38.7664058
## 12  15.5222150
## 13   7.6958881
## 14   2.3089430
## 15  18.7310090
## 16  23.7394524
## 17  12.9552466
## 18   4.3094047
## 19   6.1260009
## 20   7.0507472
## 21  75.5362286
## 22   2.1265205
## 23   6.5676470
## 24   8.7721314
## 25   9.2261419
## 26  85.6171560
## 27  22.4245578
## 28  14.5917517
## 29   1.4152608
## 30   6.8697758

# add this new column to the orginal "counts" df 
counts$dens.row.sums <- counts_mod$sum_of_rows

# make a new column for region category
counts$region <- as.factor(ifelse(counts$lake == "Eastern Brook" | 
                                    counts$lake == "Serene" |
                                  counts$lake == "Convict", "South", "North"))

#### Total densities
overall.dens.plot <- ggplot(counts, aes(x = date.simple, y = dens.row.sums, color = lake, group = lake)) +
  geom_point(size=2) + 
  geom_line() + 
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+ 
  labs( x = "Sampling Time Point", y = "Total Zooplankton Density (#/L)") +theme_classic() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
### Lake and time effects
tot.dens.mod<-lm(dens.row.sums ~ time_point + lake, data = counts)
tot.dens.site.time.aov<-Anova(tot.dens.mod, type=2) #time point trend
#plot(allEffects(tot.dens.mod))

# posthoc just looking at time point, a trend
tot.dens.ph<-emmeans(tot.dens.mod, pairwise~time_point)
multcomp::cld(tot.dens.ph, Letters=letters)

### Env. effects
stepAIC(lm(dens.row.sums ~ latitude + elevation_m + DO..perc + log(DOC) + chla_ug.L + pH + cond + temp..C, data = counts))
tot.zoop.dens.env.mod<-lm(dens.row.sums ~ latitude + elevation_m + DO..perc + cond + temp..C, data = counts)
tot.zoop.dens.env.aov<-Anova(tot.zoop.dens.env.mod, type=2)
#plot(allEffects(lm(dens.row.sums ~ temp..C, data = counts)))
  • Best models are below for total zooplankton densities by Site x Time, and by environmental conditons.
  • These are in (BOTTOM) Table S1 and Table S2.
tot.dens.site.time.aov
## Anova Table (Type II tests)
## 
## Response: dens.row.sums
##            Sum Sq Df F value    Pr(>F)    
## time_point 6015.3  4  7.5821 0.0006901 ***
## lake       1373.8  5  1.3853 0.2718219    
## Residuals  3966.8 20                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tot.zoop.dens.env.aov
## Anova Table (Type II tests)
## 
## Response: dens.row.sums
##             Sum Sq Df F value Pr(>F)  
## latitude     783.8  1  2.4346 0.1318  
## elevation_m  636.9  1  1.9782 0.1724  
## DO..perc     731.1  1  2.2707 0.1449  
## cond         619.1  1  1.9230 0.1783  
## temp..C     2278.1  1  7.0761 0.0137 *
## Residuals   7726.7 24                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now, that we have examined the total density of zooplankton, we will test for differences in density of individual taxa.

  • first will look at Daphnia, calanoid copepods, cyclopoid copepods, nauplii, asplanchna, and other less common groups.
  • make dot-line plots and run linear models.
##### Daphnia
daphnia.dens.plot <- ggplot(counts, aes(x = date.simple, y = daphnia.dens, color = lake, group = lake)) +
  geom_point(size = 2) + 
  geom_line() + 
  labs( x = "Sampling Time Point", y = "Daphnia Density (#/L)") +
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  theme_classic() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

# model
daph.dens.mod<-lm(daphnia.dens ~ time_point + lake, data = counts)
daph.dens.aov<-Anova(daph.dens.mod, type=2) #time point trend
hist(residuals(daph.dens.mod))
plot(allEffects(daph.dens.mod))

# posthoc just looking at time point, a trend
daph.dens.ph<-emmeans(daph.dens.mod, pairwise~time_point)
multcomp::cld(daph.dens.ph, Letters=letters)
##  time_point emmean  SE df lower.CL upper.CL .group
##  4            1.36 1.2 20  -1.1402     3.85  a    
##  2            2.05 1.2 20  -0.4423     4.55  ab   
##  5            2.55 1.2 20   0.0537     5.04  ab   
##  3            2.93 1.2 20   0.4308     5.42  ab   
##  1            6.50 1.2 20   4.0011     8.99   b   
## 
## Results are averaged over the levels of: lake 
## Confidence level used: 0.95 
## P value adjustment: tukey method for comparing a family of 5 estimates 
## significance level used: alpha = 0.05 
## NOTE: If two or more means share the same grouping symbol,
##       then we cannot show them to be different.
##       But we also did not show them to be the same.


##### Calanoid
calanoid.dens.plot <- ggplot(counts, aes(x = date.simple, y = cal.dens, color = lake, group = lake)) +
  geom_point(size = 2) + 
  geom_line() + 
  labs( x = "Sampling Time Point", y = "Calanoid Density (#/L)") +
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  theme_classic() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

# model
calan.dens.mod<-lm(cal.dens ~ time_point + lake, data = counts)
calan.dens.aov<-Anova(calan.dens.mod, type=2) #lake effect
hist(residuals(calan.dens.mod))
plot(allEffects(calan.dens.mod))

# posthoc just looking at time point, a trend
calan.dens.ph<-emmeans(calan.dens.mod, pairwise~lake)
multcomp::cld(calan.dens.ph, Letters=letters)
##  lake          emmean    SE df lower.CL upper.CL .group
##  Blue           0.161 0.351 20  -0.5716    0.893  a    
##  Cooney         0.162 0.351 20  -0.5708    0.894  a    
##  Virginia       0.181 0.351 20  -0.5518    0.913  a    
##  Eastern Brook  0.695 0.351 20  -0.0372    1.428  ab   
##  Convict        1.656 0.351 20   0.9231    2.388  ab   
##  Serene         2.117 0.351 20   1.3845    2.849   b   
## 
## Results are averaged over the levels of: time_point 
## Confidence level used: 0.95 
## P value adjustment: tukey method for comparing a family of 6 estimates 
## significance level used: alpha = 0.05 
## NOTE: If two or more means share the same grouping symbol,
##       then we cannot show them to be different.
##       But we also did not show them to be the same.


#### Cyclopoid
cyclopoid.dens.plot <- ggplot(counts, aes(x = date.simple, y = cyc.dens, color = lake, group = lake)) +
  geom_point(size = 2) + 
  geom_line() + 
  labs( x = "Sampling Point", y = "Cyclopoid Density (#/L)") +
   scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  theme_classic() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

# model
cyclo.dens.mod<-lm(cyc.dens ~ time_point + lake, data = counts) # time_point
cyclo.dens.aov<-Anova(cyclo.dens.mod, type=2)
hist(residuals(cyclo.dens.mod))
plot(allEffects(cyclo.dens.mod))

# posthoc just looking at time point
cyclo.dens.ph<-emmeans(cyclo.dens.mod, pairwise~time_point)
multcomp::cld(cyclo.dens.ph, Letters=letters)
##  time_point emmean   SE df lower.CL upper.CL .group
##  4           0.733 1.13 20   -1.631     3.10  a    
##  2           1.244 1.13 20   -1.120     3.61  ab   
##  5           1.398 1.13 20   -0.966     3.76  ab   
##  3           1.553 1.13 20   -0.811     3.92  ab   
##  1           5.960 1.13 20    3.596     8.32   b   
## 
## Results are averaged over the levels of: lake 
## Confidence level used: 0.95 
## P value adjustment: tukey method for comparing a family of 5 estimates 
## significance level used: alpha = 0.05 
## NOTE: If two or more means share the same grouping symbol,
##       then we cannot show them to be different.
##       But we also did not show them to be the same.


#### nauplii
nauplii.dens.plot <- ggplot(counts, aes(x = date.simple, y = nau.dens, color = lake, group = lake)) +
  geom_point(size = 2) + 
  geom_line() + 
  labs( x = "Sampling Point", y = "Nauplii Density (#/L)") +
   scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  theme_classic() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

# model
nau.dens.mod<-lm(nau.dens ~ time_point + lake, data = counts)
nau.dens.aov<-Anova(nau.dens.mod, type=2) #time point signif
hist(residuals(nau.dens.mod))
plot(allEffects(nau.dens.mod))

# posthoc just looking at time point
nau.dens.ph<-emmeans(nau.dens.mod, pairwise~time_point)
multcomp::cld(nau.dens.ph, Letters=letters)
##  time_point emmean    SE df lower.CL upper.CL .group
##  5           0.419 0.768 20   -1.182     2.02  a    
##  4           0.734 0.768 20   -0.867     2.34  a    
##  3           0.998 0.768 20   -0.603     2.60  a    
##  2           1.733 0.768 20    0.131     3.33  a    
##  1           5.983 0.768 20    4.382     7.58   b   
## 
## Results are averaged over the levels of: lake 
## Confidence level used: 0.95 
## P value adjustment: tukey method for comparing a family of 5 estimates 
## significance level used: alpha = 0.05 
## NOTE: If two or more means share the same grouping symbol,
##       then we cannot show them to be different.
##       But we also did not show them to be the same.


##### asplanchna
asplanchna.dens.plot <- ggplot(counts, aes(x = date.simple, y = asp.dens, color = lake, group = lake)) +
  geom_point(size = 2) + 
  geom_line() + 
  labs( x = "Sampling Time Point", y = "Asplanchna Density (#/L)") +
   scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  theme_classic() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))


# model
asp.dens.mod<-lm(log(asp.dens+0.001) ~ time_point + lake, data = counts)
asp.dens.aov<-Anova(asp.dens.mod, type=2) #time point trend, lake signif
hist(residuals(asp.dens.mod))
plot(allEffects(asp.dens.mod))

# posthoc just looking at time point, a trend
asp.dens.ph<-emmeans(asp.dens.mod, pairwise~lake)
multcomp::cld(asp.dens.ph, Letters=letters)
##  lake          emmean   SE df lower.CL upper.CL .group
##  Eastern Brook -5.454 1.13 20    -7.80    -3.11  a    
##  Convict       -5.120 1.13 20    -7.47    -2.77  a    
##  Serene        -4.748 1.13 20    -7.10    -2.40  a    
##  Blue          -1.178 1.13 20    -3.53     1.17  a    
##  Cooney        -0.857 1.13 20    -3.21     1.49  a    
##  Virginia      -0.689 1.13 20    -3.04     1.66  a    
## 
## Results are averaged over the levels of: time_point 
## Results are given on the log(mu + 0.001) (not the response) scale. 
## Confidence level used: 0.95 
## Note: contrasts are still on the log(mu + 0.001) scale. Consider using
##       regrid() if you want contrasts of back-transformed estimates. 
## P value adjustment: tukey method for comparing a family of 6 estimates 
## significance level used: alpha = 0.05 
## NOTE: If two or more means share the same grouping symbol,
##       then we cannot show them to be different.
##       But we also did not show them to be the same.


####### ####### ####### 
####### less common groups

#### Ceriodaphnia
ceriodaphnia <- ggplot(counts, aes(x = date.simple, y = cerio.dens, color = lake, group = lake)) +
  geom_point(size=2) + 
  geom_line()+
   scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  theme_classic() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))


###### bosmina
bosmina.dens.plot <- ggplot(counts, aes(x = date.simple, y = bos.dens, color = lake, group = lake)) +
  geom_point(size=2) + 
  geom_line() +
   scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  theme_classic() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))


###### kellicotia
kellicotia.dens.plot <- ggplot(counts, aes(x = date.simple, y = kell.dens, color = lake, group = lake)) +
  geom_point(size=2) + 
  geom_line() +
   scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  theme_classic() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))


##### keratella
keratella.dens.plot <- ggplot(counts, aes(x = date.simple, y = ker.dens, color = lake, group = lake)) +
  geom_point(size=2) + 
  geom_line() +
   scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  theme_classic() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))


#### holopedium
holopedium.dens.plot <- ggplot(counts, aes(x = date.simple, y = holo.dens, color = lake, group = lake)) +
  geom_point(size=2) + 
  geom_line() +
   scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  theme_classic() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))


#### polyphemus
polyphemus.dens.plot <- ggplot(counts, aes(x = date.simple, y = poly.dens, color = lake, group = lake)) +
  geom_point(size=2) + geom_line() +
   scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  theme_classic() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))


#### chydorus
chydorus.dens.plot <- ggplot(counts, aes(x = date.simple, y = chy.dens, color = lake, group = lake)) +
  geom_point(size=2) + geom_line() +
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  theme_classic()
  • Compile all  dot-line plots for total zooplantkon and the common zooplankton taxa (taxa level density)
####### make aggregated plot
combined.dens.plot <- ggarrange(
                           overall.dens.plot + theme(axis.title.y =element_text(size=8))+ guides(color="none"),
                           daphnia.dens.plot + theme(axis.title.y =element_text(size=8))+ guides(color="none"),
                           calanoid.dens.plot + theme(axis.title.y=element_text(size=8))+ guides(color="none"),
                           cyclopoid.dens.plot + theme(axis.title.y=element_text(size=8))+ guides(color="none"),
                           nauplii.dens.plot + theme(axis.title.y=element_text(size=8))+ guides(color="none"),
                           asplanchna.dens.plot + theme(axis.title.y=element_text(size=8))+ guides(color="none"),
                           labels = c("A", "B", "C", "D", "E", "F"), nrow = 2, ncol = 3)
                           #common.legend = TRUE, legend="right",
                           #align="hv", font.label = list(size=10, color="black", family=NULL, position = "top"))
combined.dens.plot
dev.print(pdf, "figures/FigS3.combined.dens.plot.pdf", height=7, width=8)
dev.off()
**Figure S3.** Changes in zooplankton densities across six mountain lakes across five sampling time points

Figure S3. Changes in zooplankton densities across six mountain lakes across five sampling time points

Plankton relative abundance

  • Create a stacked density plot for zooplankton relative abundance
long_counts <- tidyr::pivot_longer(data = counts, cols = c(23:34), 
                                   names_to = "Taxa", values_to= "Density")
long_counts <- as.data.frame(long_counts)
str(long_counts)
long_counts$Taxa<-as.factor(long_counts$Taxa)
long_counts$Rel.abund <- long_counts$Density/ long_counts$dens.row.sums

is.na(long_counts)<-sapply(long_counts, is.infinite)
long_counts[is.na(long_counts)]<-0

# try with just logging the densities ("value"), still an issue of zeros but let's just look
#mod2<-lm(log(Density+0.001) ~Taxa+time_point*lake, data = long_counts)
#Anova(mod2, type=3)

long_counts$Time.cont<-as.numeric(long_counts$time_point)

# area plot
zoop_abundances_area <- ggplot(long_counts, aes(x=Time.cont, y=Rel.abund, fill=Taxa)) +
  geom_area() +
  facet_grid(cols = vars(lake)) +
  xlab("Sampling Time Point") +ylab("Relative Abundance")  +
  scale_fill_manual(values = c("firebrick3", "lightsalmon", "gold", "darkorange", "darkseagreen2", "mediumseagreen","lightskyblue","plum", "royalblue3", "palegoldenrod", "tan", "gray54")) + theme_bw()
 
zoop_abundances_area
**Figure 3.** Relative abundances of zooplankton taxa (as proportion of total density per liter) in communities of six mountain lakes in California’s Eastern Sierra Nevada across five sampling time points.

Figure 3. Relative abundances of zooplankton taxa (as proportion of total density per liter) in communities of six mountain lakes in California’s Eastern Sierra Nevada across five sampling time points.

dev.copy(pdf, "figures/Fig3.zoop_abundances_areaplot.pdf", height=6, width=15)
dev.off()

Run models to test for differences in zooplankton community, using Shannon diversity. First look for the model that best explains the data using StepAIC, then run ANOVA on the best model.
- Note Interactions aren’t appropriate for these models since there is only n=1 in each lake/time combination. We therefore use Type II SS of main effects.

# Shannon diversity
zoop.comm.div <- as.data.frame(vegan::diversity(counts[,c(23:34)], index="shannon"))
zoop.comm.div <- cbind(counts, zoop.comm.div)
names(zoop.comm.div)[47] <- "Shannon"

zoop.shan.lake.time.mod<-lm(Shannon ~ time_point + lake, data = zoop.comm.div)
zoop.shan.lake.time.aov<-Anova(zoop.shan.lake.time.mod, type=2)
# no effect of time or lake on Shannon diversity of zoop community samples (p> 0.425)

stepAIC(lm(Shannon ~ latitude + elevation_m + DO..perc + log(DOC) + chla_ug.L + pH + cond + temp..C, data = zoop.comm.div))

shan.zoop.env.mod<-lm(Shannon ~ latitude + cond + temp..C, data = zoop.comm.div)
shan.zoop.env.aov<-Anova(shan.zoop.env.mod, type=2)
plot(allEffects(lm(Shannon ~ temp..C, data = zoop.comm.div)))
  • Best models are below for Shannon Diversity by Site x Time, and Shannon Diversity x environmental conditions.
  • These are in (TOP) Table S1 and Table S2.
zoop.shan.lake.time.aov
## Anova Table (Type II tests)
## 
## Response: Shannon
##             Sum Sq Df F value Pr(>F)
## time_point 0.31036  4  1.0125 0.4245
## lake       0.39515  5  1.0313 0.4262
## Residuals  1.53259 20
shan.zoop.env.aov
## Anova Table (Type II tests)
## 
## Response: Shannon
##            Sum Sq Df F value  Pr(>F)  
## latitude  0.20085  1  2.9110 0.09990 .
## cond      0.19360  1  2.8059 0.10590  
## temp..C   0.29925  1  4.3372 0.04727 *
## Residuals 1.79392 26                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Microbiome Data

Import Phyloseq data from DADA pipeline. The data will need to be cleaned and organized so that it can be processed for downstream analysis. - Modify the metadata to meet downstream needs - Once data has been cleaned up, export new phyloseq objects and their parts - Ultimate product is the non-rarified phyloseq object that can be processed in subsequent chunks - We wil inspect read depth and rarefaction curves to determine how to standardize sampling depth.

##########################################################
detach("package:MicrobiotaProcess", TRUE) # reload later, has conflicts with phyloseq

# ps.prune is the final phyloseq object from pipeline, can load it in here...
### or load in the raw data and re-assemble

ps.prune.LOCAL<-readRDS("output/dada proc/ps.prune_local.RDS") #8056 taxa of 328 samples

# idiot check: make sure to remove all mitochondria and chloroplast taxonomic IDs
ps.prune.LOCAL <- subset_taxa(ps.prune.LOCAL, Family!= "Mitochondria" | 
                        is.na(Family) & Class!="Chloroplast" | is.na(Class))

# chloroplasts sneaking in under Order in some cyanobacteria, looks good @ 8056 ASVs
ps.prune.LOCAL <- subset_taxa(ps.prune.LOCAL, Order!="Chloroplast"| is.na(Class))

# make sure any ASVs in the data that not in > 1 sample, removed
ps.prune.LOCAL <- prune_taxa(taxa_sums(ps.prune.LOCAL) > 1, ps.prune.LOCAL) 
# few sneaked in, @ 8040 ASVs in 328 samples
# summarize_phyloseq(ps.prune.LOCAL) <<< use this for reporting

# order the lakes by North to South or upstream in phyloseq 
sample_data(ps.prune.LOCAL)$Lake <- factor(sample_data(ps.prune.LOCAL)$Lake, levels = 
                                           c("Cooney", "Blue", "Virginia", "Convict", "Serene", "Eastern Brook"))


# export metadata and play
metaD<-microbiome::meta(ps.prune.LOCAL)

# adjust caps
metaD<-metaD %>% 
  dplyr::rename(
    sequencing_ID = Sequencing_ID,
    sample_type = Sample_Type,
    organism = Organism,
    time_point= Time.Point,
    lake = Lake,
    sample_ID = Original_ID
  )

# correct NAs to be "water"
metaD$organism<-as.character(metaD$organism) # need to do this to change the level
metaD$organism[is.na(metaD$organism)] = "Water"
metaD$organism<-as.factor(metaD$organism)
metaD$organism<-factor(metaD$organism, levels =c("Bosmina", "Ceriodaphnia", "Daphnia", "Holopedium", 
                                             "Cyclopoid", "Calanoid", "Large Calanoid",
                                             "Water")) # back to factor

metaD$organism<-recode_factor(metaD$organism, "Large Calanoid" = "Calanoid")

# order
metaD$organism<-factor(metaD$organism, levels =c("Daphnia", "Holopedium", 
                                   "Calanoid", "Cyclopoid", "Water", "Bosmina", "Ceriodaphnia"))

# format metadata (have to redo this if loading back in)
make.fac<-c("sample_ID", "sequencing_ID", "sample_type", "time_point", "lake", "Plate", "Plate_name", "Well", "sample_control")

metaD[make.fac]<-lapply(metaD[make.fac], factor) # make all these factors

# create phy group (Cladocera vs Copepoda vs water) column
metaD$phy_group <- as.factor(ifelse(metaD$organism == "Calanoid" | 
                            metaD$organism == "Cyclopoid", "copepoda", 
                   ifelse(metaD$organism == "Daphnia" | 
                            metaD$organism == "Ceriodaphnia" | 
                            metaD$organism =="Bosmina" | 
                            metaD$organism == "Holopedium", "cladocera", 
                            "water")))

# create basin column
metaD$basin <- as.factor(ifelse(metaD$lake == "Eastern Brook" | 
                                metaD$lake == "Serene" | 
                                metaD$lake == "Convict", "South", 
                                "North"))

# create latitude column
metaD$latitude <- as.numeric(ifelse(metaD$lake == "Blue", "38.050952", 
                            (ifelse(metaD$lake == "Convict", "37.590094",
                            (ifelse(metaD$lake == "Cooney", "38.04806",
                            (ifelse(metaD$lake == "Eastern Brook", "37.431526", 
                            (ifelse(metaD$lake == "Serene", "37.438392" ,
                            (ifelse(metaD$lake == "Virginia", "38.046975",
                                  "0"))))))))))))

# create longitude column
metaD$longitude <- as.numeric(ifelse(metaD$lake == "Blue", "-119.270331", 
                            (ifelse(metaD$lake == "Convict", "-118.857422",
                            (ifelse(metaD$lake == "Cooney", "-119.27702",
                            (ifelse(metaD$lake == "Eastern Brook", "-118.742615", 
                            (ifelse(metaD$lake == "Serene", "-118.744386" ,
                            (ifelse(metaD$lake == "Virginia", "-119.265076",
                                  "0"))))))))))))
                            

##### Read counts and rarefaction curves
# Make a data frame with a column for the read counts of each sample
sample_sum_df<-as.data.frame(sample_sums(ps.prune.LOCAL))
colnames(sample_sum_df)<-"read.sum"
sample_sum_df$sampleNames<-rownames(sample_sum_df)

# are row names the same? if so, add them in
identical(rownames(sample_sum_df), rownames(metaD))
metaD$read.sum<-sample_sum_df$read.sum

# add in dates, as per metadata
metaD$date<-ifelse(metaD$time_point =="1" | metaD$lake =="Blue", "6/29/2022",
                            ifelse(metaD$time_point =="2" | metaD$lake =="Blue", "7/09/2022",
                            ifelse(metaD$time_point =="3" | metaD$lake =="Blue", "7/21/2022",
                            ifelse(metaD$time_point =="4" | metaD$lake =="Blue", "8/04/2022",
                            ifelse(metaD$time_point =="5" | metaD$lake =="Blue", "8/22/2022",
                    
                     ifelse(metaD$time_point =="1" | metaD$lake =="Convict", "6/25/2022",
                            ifelse(metaD$time_point =="2" | metaD$lake =="Convict", "7/07/2022",
                            ifelse(metaD$time_point =="3" | metaD$lake =="Convict", "7/22/2022",
                            ifelse(metaD$time_point =="4" | metaD$lake =="Convict", "8/05/2022",
                            ifelse(metaD$time_point =="5" | metaD$lake =="Convict", "8/23/2022",
                          
                     ifelse(metaD$time_point =="1" | metaD$lake =="Cooney", "6/29/2022",
                            ifelse(metaD$time_point =="2" | metaD$lake =="Cooney", "7/09/2022",
                            ifelse(metaD$time_point =="3" | metaD$lake =="Cooney", "7/21/2022",
                            ifelse(metaD$time_point =="4" | metaD$lake =="Cooney", "8/04/2022",
                            ifelse(metaD$time_point =="5" | metaD$lake =="Cooney", "8/22/2022", 
                                
                     ifelse(metaD$time_point =="1" | metaD$lake =="Eastern Brook", "6/26/2022",
                            ifelse(metaD$time_point =="2" | metaD$lake =="Eastern Brook", "7/06/2022",
                            ifelse(metaD$time_point =="3" | metaD$lake =="Eastern Brook", "7/22/2022",
                            ifelse(metaD$time_point =="4" | metaD$lake =="Eastern Brook", "8/09/2022",
                            ifelse(metaD$time_point =="5" | metaD$lake =="Eastern Brook", "8/23/2022",
                                   
                    ifelse(metaD$time_point =="1" | metaD$lake =="Serene", "6/26/2022",
                            ifelse(metaD$time_point =="2" | metaD$lake =="Serene", "7/06/2022",
                            ifelse(metaD$time_point =="3" | metaD$lake =="Serene", "7/22/2022",
                            ifelse(metaD$time_point =="4" | metaD$lake =="Serene", "8/09/2022",
                            ifelse(metaD$time_point =="5" | metaD$lake =="Serene", "8/23/2022",
                                  
                    ifelse(metaD$time_point =="1" | metaD$lake =="Virginia", "6/29/2022",
                            ifelse(metaD$time_point =="2" | metaD$lake =="Virginia", "7/09/2022",
                            ifelse(metaD$time_point =="3" | metaD$lake =="Virginia", "7/21/2022",
                            ifelse(metaD$time_point =="4" | metaD$lake =="Virginia", "8/04/2022",
                            "8/22/2022"
                            )))))))))))))))))))))))))))))

# adjust formatting to be Date (from)
metaD$date<-as.Date(mdy(metaD$date))

metaD<- metaD %>%
  dplyr::select(sequencing_ID, sampleNames, sample_ID, read.sum, time_point, date, lake, latitude, longitude, basin, sample_type, organism, Number.of.Organism, phy_group)

   
# the metadata above is now up to date with info necessary for downstream analysis
# now merge in the environmental data 

################################## ##################################
# read in the environmental data
env.metad<-read.csv("data/full.library.env.metaD.csv")

# make a column for 'Samplenames' and use this as rownames
env.metad$sampleNames<-env.metad$sequencing_ID
env.metad$sampleNames <- gsub("^.{0,3}", "", env.metad$sampleNames)
rownames(env.metad)<- env.metad$sampleNames

# remove the rows not found in the meta-meta data -- these are samples lost in dada2/controls
env.metad<-env.metad[(env.metad$sampleNames %in% metaD$sampleNames),]

env.metad<- env.metad %>%
  dplyr::select(sample_ID, sequencing_ID, sampleNames, sample_type, organism, Number.of.Organism, 
                time_point, lake, elevation_m, temp_C, chla_ug.L, pH, DO_perc, spc, cond, DOC, TDN, TDP)

## replace outliers in nutrient analysis
env.metad<-env.metad %>% 
  naniar::replace_with_na(replace = list(TDP = 4102.48))


################################## ##################################
# no merge, subset to make it easier to merge...
env.metad.reduce<-env.metad %>%
  dplyr::select(sampleNames, elevation_m, temp_C, chla_ug.L, pH, DO_perc, spc, cond, DOC, TDN, TDP)


# merge the two = final metadata, add in rownames for phyloseq merge
MetaD.SQ.Env<-merge(metaD, env.metad.reduce, by="sampleNames", all.x=TRUE)
rownames(MetaD.SQ.Env)<-MetaD.SQ.Env$sampleNames

# make new column for above or below treeline
MetaD.SQ.Env$elev.cat <- as.factor(ifelse(MetaD.SQ.Env$elevation_m < 2900, "Below", "Above"))

# are row names the same? if so, let's boogie
identical(rownames(sample_data(ps.prune.LOCAL)), rownames(MetaD.SQ.Env))
write.csv(MetaD.SQ.Env, "output/MetaD.SQ.Env.csv")

#updated metadata back into phyloseq
sample_data(ps.prune.LOCAL)<-MetaD.SQ.Env
PS.fin<- ps.prune.LOCAL

table(sample_data(PS.fin)$sample_type) #328 samples: 30 water, 298 zooplankton

# change to this order for later plot tests-- ultimately Bosmina and Ceriodaphnia are dropped nad large calanoids grouped into calanoids...
sample_data(PS.fin)$organism<-factor(sample_data(PS.fin)$organism, levels =c("Daphnia", "Holopedium", 
                                   "Calanoid", "Cyclopoid", "Water", "Bosmina", "Ceriodaphnia")) # back to factor

table(phyloseq::tax_table(PS.fin)[, "Kingdom"], exclude = NULL) # 289 Archaea, 7751 Bacteria
summarize_phyloseq(PS.fin)

########### ########### ########### ########### 
saveRDS(PS.fin, "output/dada proc/PS.fin.RDS") # save phyloseq
########### ########### ########### ########### 

#### export each element
OTU = as(otu_table(PS.fin), "matrix")
if(taxa_are_rows(PS.fin)){OTU <- t(OTU)}
# Coerce to data.frame
OTUdf = as.data.frame(OTU)

Tax.tab = as(phyloseq::tax_table(PS.fin), "matrix")
if(taxa_are_rows(PS.fin)){Tax.tab <- t(Tax.tab)}
# Coerce to data.frame
TAXdf = as.data.frame(Tax.tab)

Sam.tab = as(sample_data(PS.fin), "matrix")
if(taxa_are_rows(PS.fin)){Sam.tab <- t(Sam.tab)}
# Coerce to data.frame
SAMdf = as.data.frame(Sam.tab)

write.csv(OTUdf, "output/phyloseq output/OTUdf.nonrar.csv")
write.csv(TAXdf, "output/phyloseq output/TAXdf.nonrar.csv")
write.csv(SAMdf, "output/phyloseq output/SAMdf.nonrar.csv")

## with bosmina and ceriodaphnia still in dataset, and pre-rarefaction...
microbiome::summarize_phyloseq(PS.fin)
#### Inspect the run: 
# what is the # of reads/sample? 
# A bit over inflated due to water being so deeply sequenced, but log-read sum gives us a good picture

# reads per sample
read.sample<-ggplot(MetaD.SQ.Env, aes(x=sample_type, y=read.sum, color=organism)) + geom_boxplot() + theme_bw() +
  scale_color_manual(values=c("coral2", "gold3", "mediumaquamarine", "springgreen4", 
                              "dodgerblue", "violet", "wheat3"))

### log reads per sample
log.read.sample<-ggplot(MetaD.SQ.Env, aes(x=organism, y=log(read.sum), color=organism)) + geom_boxplot() + theme_bw() +
  scale_color_manual(values=c("coral2", "gold3", "mediumaquamarine", "springgreen4", 
                              "dodgerblue", "violet", "wheat3"))

# read depth a bit low for zooplankton vs water, average depth is less than 2000 reads
ggplot(MetaD.SQ.Env[(MetaD.SQ.Env$sample_type=="Zooplankton"),], aes(x=sample_type, y=read.sum, color=organism)) +
  geom_boxplot() +  theme_bw() +
  scale_color_manual(values=c("coral2", "gold3", "mediumaquamarine", "springgreen4", 
                              "dodgerblue", "violet", "wheat3"))
# Histogram of sample read counts
hist.depth <- ggplot(MetaD.SQ.Env, aes(read.sum)) + 
  geom_histogram(color="gold4", fill= "goldenrod2", binwidth = 10000) + 
  xlab("Read counts") +
  ggtitle("Sequencing Depth") + theme_classic()

# a different view
#read_depth_violin <- ggplot(MetaD.SQ.Env, aes(x=1, y=read.sum)) +
#  geom_violin() + ggtitle("Distribution of sample sequencing depth") + scale_y_log10() +
#  xlab("Samples") + ylab("Read Depth") + geom_boxplot(width=0.1)

# Rarefaction curve
count_table_filt <- as.data.frame(otu_table(PS.fin))
rarecurve((count_table_filt), step=50, cex=0.5, xlim=c(0,100000), ylab = "ASVs", label=FALSE, col="gold4")
#abline(v = 500, lty = "dotted", col="red", lwd=2)

dev.copy(pdf, "figures/FigS2.rare.raw.pdf", height=4.5, width=7)
dev.off() 
**Figure S2**. Rarefaction curve of number of bacterial ASVs as a function of sequencing depth (the number of reads per sample).

Figure S2. Rarefaction curve of number of bacterial ASVs as a function of sequencing depth (the number of reads per sample).

Now we will subset the phyloseq objects and compare the influence of rarefaction for the phyloseq data.

  • PS.fin is the final phyloseq object with all the data.
  • PS.500 is rarefied to 500 read-depth (with low abundance samples removed).
  • PS.zoop is rarefied zooplankton data (500 read depth).
  • PS.water is rarefied water only (500 read depth).
###### subset the PS objects for needs...
# final and NOT rarefied data: 
PS.fin
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 8040 taxa and 328 samples ]
## sample_data() Sample Data:       [ 328 samples by 25 sample variables ]
## tax_table()   Taxonomy Table:    [ 8040 taxa by 7 taxonomic ranks ]
## refseq()      DNAStringSet:      [ 8040 reference sequences ]


# rarified to 500 reads: PS.rar, then reclassify and reduce some taxa and name "PS.500'
# 44 samples removedbecause they contained fewer reads than `sample.size`
# 5460 OTUs were removed because they are no longer present
PS.rar = rarefy_even_depth(PS.fin, rngseed=111, sample.size=500, replace=F) 
PS.rar <- prune_taxa(taxa_sums(PS.rar) >0, PS.rar) # remove those asvs not in at least 1 sample again

table(sample_data(PS.rar)$sample_type) #284 samples: 28 water, 256 zooplankton, 1031 ASVs
## 
##       Water Zooplankton 
##          28         256
summarize_phyloseq(PS.rar)
## [[1]]
## [1] "1] Min. number of reads = 500"
## 
## [[2]]
## [1] "2] Max. number of reads = 500"
## 
## [[3]]
## [1] "3] Total number of reads = 142000"
## 
## [[4]]
## [1] "4] Average number of reads = 500"
## 
## [[5]]
## [1] "5] Median number of reads = 500"
## 
## [[6]]
## [1] "7] Sparsity = 0.977898241052639"
## 
## [[7]]
## [1] "6] Any OTU sum to 1 or less? YES"
## 
## [[8]]
## [1] "8] Number of singletons = 763"
## 
## [[9]]
## [1] "9] Percent of OTUs that are singletons \n        (i.e. exactly one read detected across all samples)38.9087200407955"
## 
## [[10]]
## [1] "10] Number of sample variables are: 25"
## 
## [[11]]
##  [1] "sampleNames"        "sequencing_ID"      "sample_ID"         
##  [4] "read.sum"           "time_point"         "date"              
##  [7] "lake"               "latitude"           "longitude"         
## [10] "basin"              "sample_type"        "organism"          
## [13] "Number.of.Organism" "phy_group"          "elevation_m"       
## [16] "temp_C"             "chla_ug.L"          "pH"                
## [19] "DO_perc"            "spc"                "cond"              
## [22] "DOC"                "TDN"                "TDP"               
## [25] "elev.cat"


###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### 
# Compare full data, ordinations, and rarefied beta diversity
###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### 


###################### full data, non-rarified DCA ###################### 
# check beta diversity heterogeneity using DETRENDED CORRESPONDANCE ANALYSIS (DCA)
# can be measured by DCA (the length of the first DCA axis is a measure of heterogeneity
# DCA1 > 4 = the dataset is considered as heterogeneous and suitable for unimodal methods (RDA, CCA)
# DCA1 < 3 = it can be considered as homogeneous, suitable for linear methods (PCA, RDA)
# DCA1 =  3-4, both linear or unimodal methods are ok. After this decision, use either linear (PCA, RDA)

#DCA.full<-ordinate(PS.fin, method='DCA') # DCA-axis1 length = 6
#plot_ordination(PS.fin, DCA.full, color = "sample_type",  type='split')


###################### full data, non-rarified PCoA ###################### 
# distance based (db) PCoA (principal coordinate analysis)
# PCoA is preferred if wanting to interpret the absolute distances between communities in different samples.
PCoA.full <- ordinate(PS.fin, method = 'PCoA', distance = 'bray')
plot_ordination(PS.fin, PCoA.full, color = "sample_type")


###################### transformation based ordination (tb), ###################### 
# bypass the hetero/homogeneity issue, transform  species composition data by Hellinger transformation
# use either unconstrained (tb-PCA) or constrained (tb-RDA) approach
# Hellinger transformation in un-even community datasets better linearizes the distances among sites 
# ... and reduce the effect of dominant species in the site dissimilarity

####  tb-PCA, 
# Principal Components Analysis is an unconstrained method that does not use a distance matrix. 
# PCA directly uses the (transformed) microbial variables
# package will hellinger transform

library("MicrobiotaProcess") # note this has issues with phyloseq so need to unload it after
PCA.hell.nonrar <- get_pca(PS.fin, method="hellinger")
PCA.hell.nonrar.plot <- ggordpoint(PCA.hell.nonrar, biplot=TRUE, 
                      speciesannot=TRUE,
                      factorNames=c("organism"), ellipse=TRUE)

#### ####  ####  rarefied, manually hellinger transformed: Constrained ordination (tb-RDA)
# transform the rarified (500 reads) data and Hellinger transform
# convert the data to proportions and then take the square root
# do RDA/PCA  here, since non-distance based ordination

PS.rar.hell.test = transform_sample_counts(PS.rar, function(x) sqrt(x / sum(x)))
ord.hell.RDA <- ordinate(PS.rar.hell.test, method = 'RDA')

Hell.tb.RDA<- plot_ordination(
  physeq=PS.rar.hell.test, 
  ordination = ord.hell.RDA) +
  geom_point(aes(color = organism), size = 2) +    
  scale_color_manual(values=c("coral", "goldenrod2", "darkolivegreen3", "darkslategray4",
                              "dodgerblue", "violet", "wheat3")) +
  stat_ellipse(level=0.95, linetype = 2, aes(color=organism)) +
  ggtitle("ALL taxa: tb-RDA (Hell-500)") +
  theme_classic() 

#############################

# what samples have low reads
plot_ordination(PS.fin, PCoA.full, shape = "sample_type") +
  geom_point(aes(color = read.sum > 500))

Rare.culling<-plot_ordination(PS.fin, PCoA.full, color = "organism") +
  geom_point(aes(shape = read.sum > 500, size= read.sum > 500)) +
  scale_size_manual(values= c(6,2)) +
  scale_shape_manual(values=c(1, 16))+
  scale_color_manual(values=c("coral", "goldenrod2", "darkolivegreen3", "darkslategray4",
                              "dodgerblue", "violet", "wheat3")) +
  ggtitle("Samples culled by rarefaction") + theme_classic()


# plot the PCoA for non-rarified
Non.rar.PCoA<-plot_ordination(
  physeq = PS.fin,                                                   
  ordination = PCoA.full) +                                                
  geom_point(aes(color = organism), size = 2) +    
  stat_ellipse(level=0.95, linetype = 2, aes(color=organism)) +
  scale_color_manual(values=c("coral", "goldenrod2", "darkolivegreen3", "darkslategray4", 
                              "dodgerblue", "violet", "wheat3")) +
  ggtitle("ALL taxa-all data") +
  theme_classic() 

###### if rarefied in a distance based approach, PCoA
ord.rar <- ordinate(PS.rar, method = 'PCoA', distance = 'bray')
plot_ordination(PS.rar, ord.rar, color = "sample_type")

Rar.PCoA<- plot_ordination(
  physeq = PS.rar,                                                   
  ordination = ord.rar) +                                                
  geom_point(aes(color = organism), size = 2) +    
  stat_ellipse(level=0.95, linetype = 2, aes(color=organism)) +
  scale_color_manual(values=c("coral", "goldenrod2","darkolivegreen3", "darkslategray4",
                              "dodgerblue", "violet", "wheat3")) +
  ggtitle("ALL taxa- 500reads") +
  theme_classic() 



########### PCoA - unifrac/weighted
# distance based ordination on rarified data
# UniFrac is a β-diversity measure that uses phylogenetic information to compare environmental samples
# Unweighted UniFrac is more sensitive to differences in low-abundance features
# Weighted UniFrac for abundance and phylogeny, impact of low-abundance features is diminished, is useful for examining differences in community structure

# make phylogenetic tree and add to phyloseq if wanting to do unifrac weight/unweighted
library("ape")
random_tree = rtree(ntaxa(PS.rar), rooted=TRUE, tip.label=taxa_names(PS.rar))

# add in
phy_tree(PS.rar)<-random_tree

# unweighted since we have low abundance taxa
ord.unif <- ordinate(PS.rar, method = 'PCoA', distance ="unifrac")

PCoA.rar.UnW.unif<- plot_ordination(
  physeq=PS.rar, 
  ordination = ord.unif) +
  geom_point(aes(color = organism), size = 2) +    
  stat_ellipse(level=0.95, linetype = 2, aes(color=organism)) +
  scale_color_manual(values=c("coral", "goldenrod2", "darkolivegreen3", "darkslategray4",
                              "dodgerblue", "violet", "wheat3")) +
  ggtitle("ALL taxa: PCoA 500.UnW.unifrac") +
  theme_classic() 

# weighted for dominant taxa
ord.wunif <- ordinate(PS.rar, method = 'PCoA', distance ="wunifrac")

PCoA.rar.w.unif<- plot_ordination(
  physeq=PS.rar, 
  ordination = ord.wunif) +
  geom_point(aes(color = organism), size = 2) +    
  stat_ellipse(level=0.95, linetype = 2, aes(color=organism)) +
  scale_color_manual(values=c("coral", "goldenrod2", "darkolivegreen3", "darkslategray4", 
                              "dodgerblue", "violet", "wheat3")) +
  ggtitle("ALL taxa: PCoA 500.W.unifrac") +
  theme_classic() 

## inspect sample size effects for samples with more or less than average # individuals
Rare.number.org<-plot_ordination(PS.fin, PCoA.full, color = "Number.of.Organism") +
  geom_point(aes(color= Number.of.Organism, shape=organism), size=2) +
  scale_colour_gradient(low = "gold", high = "dodgerblue") +
  scale_shape_manual(values=c(16,17,18,1,2,3,4))+
  ggtitle("Organism number rarefaction") + theme_classic() 
Rare.number.org$layers <- Rare.number.org$layers[-1]
####
ordination.tests<-plot_grid(Rare.culling + guides(color="none"), 
                      Non.rar.PCoA +  guides(color="none"), 
                      Rar.PCoA,   
                      Rare.number.org,
                      PCoA.rar.w.unif +  guides(color="none"),
                      Hell.tb.RDA + guides(color="none"), ncol=3, rel_widths=c(5,4,4,5,4,4))

ordination.tests
dev.copy(pdf, "figures/Ordination.tests.pdf", height=10, width=15)
dev.off()
#######

# unload 'MicrobiotaProcess'
detach("package:MicrobiotaProcess", TRUE)
**Exploratory Figure**. Testing different approaches and identifying samples with low reads to be dropped.

Exploratory Figure. Testing different approaches and identifying samples with low reads to be dropped.

After running the tests of rarifying data transformation influence our data (and seeing the low sample size for Bosmina and Ceriodaphnia), we will proceed with a rarified dataset and drop these two species. The variation in DCA shows we can proceed with assumption of heterogeneous distributions and use RDA/CCA, use PCoA/NMDS with Bray-Curtis, or proceed with rank-based ordination approached (NMDS) with fewer axis.

We will use rarified data, and demonstrate ordination differences using PCoA with unweighted unifrac distance (accounting for abundance and phylogeny) and NMDS-Bray Curtis for maximal distance between groups.

################### final phyloseq object based on the above

# give it a redo now that we know we don't want other levels here...

# this PS.500 is used for the richness data -- non-rarefied
# first, use the PS.fin data and remove the samples no longer wanted
PS.rm = subset_samples(PS.fin, !organism=="Bosmina" & !organism=="Ceriodaphnia") # removes 12 samples
PS.rm <- prune_taxa(taxa_sums(PS.rm) >1, PS.rm) # remove those asvs not in > 1 sample (again)

# removes 35 samples and 5467 OTUs
PS.500 = rarefy_even_depth(PS.rm, rngseed=111, sample.size=500, replace=F) 
PS.500 <- prune_taxa(taxa_sums(PS.500) >0, PS.500) # remove those asvs not in > 0 sample, in the final data
# 1651 taxa and 281 samples

table(sample_data(PS.500)$sample_type)
#281 samples: 28 water, 253 zooplankton, 1651 ASVs

# check sample data
table(sample_data(PS.500)$organism) 
# Calanoid    Daphnia Holopedium  Cyclopoid      Water 
#       74         98          8         73         28 

# set order of organims
sample_data(PS.500)$organism<-factor(sample_data(PS.500)$organism, 
                                     levels =c("Daphnia", "Holopedium", 
                                            "Calanoid", "Cyclopoid", "Water")) # back to factor
summarize_phyloseq(PS.500)
##########################

#### export each element
OTU = as(otu_table(PS.500), "matrix")
if(taxa_are_rows(PS.500)){OTU <- t(OTU)}
# Coerce to data.frame
OTUdf.rar = as.data.frame(OTU)

Tax.tab = as(phyloseq::tax_table(PS.500), "matrix")
if(taxa_are_rows(PS.500)){Tax.tab <- t(Tax.tab)}
# Coerce to data.frame
TAXdf.rar = as.data.frame(Tax.tab)

Sam.tab = as(sample_data(PS.500), "matrix")
if(taxa_are_rows(PS.500)){Sam.tab <- t(Sam.tab)}
# Coerce to data.frame
SAMdf.rar = as.data.frame(Sam.tab)

write.csv(OTUdf.rar, "output/phyloseq output_rarified/OTUdf.rar.csv")
write.csv(TAXdf.rar, "output/phyloseq output_rarified/TAXdf.rar.csv")
write.csv(SAMdf.rar, "output/phyloseq output_rarified/SAMdf.rar.csv")

########### ########### ########### ########### 
saveRDS(PS.500, "output/dada proc/PS.500.RDS") # save phyloseq
########### ########### ########### ########### 


# pull the OTU table and sample data from phyloseq so we can run these analyses in vegan
# if needed, this is quite useful....

# convert the sample_data() within a phyloseq object to a vegan compatible data object
pssd2veg <- function(physeq) {
  sd <- sample_data(physeq)
  return(as(sd,"data.frame"))
}

# convert the otu_table() within a phyloseq object to a vegan compatible data object
psotu2veg <- function(physeq) {
  OTU <- otu_table(physeq)
  if (taxa_are_rows(OTU)) {
    OTU <- t(OTU)
  }
  return(as(OTU, "matrix"))
}

Richness (observed and shannon) using PS.500 = rarified phyloseq to 500 reads/sample.
- look at richness by sample type (water vs. zooplankton).
- look at how time and site effect richness. - how does richness differ for individual taxa compared to water (it is much lower…). - plots here will be in Figure 4 A-D, printed below with beta dispersion.

#richness by organism (or sample type i.e, water) -- not transformed, but rarefied

## plots
richness.plot.samples <- plot_richness(PS.500, x="sample_type", measures=c("Observed", "Shannon"), color="sample_type") +
  labs(y= "Alpha Diversity Measure", x = "Sample Type") + theme_bw() + 
  geom_boxplot(aes(fill=sample_type), alpha=0.7) +
  geom_point(aes(color=sample_type), alpha=0.5) +
  scale_color_manual(values = c("lightsteelblue3", "navajowhite3"))+ guides(color="none") +
  scale_fill_manual(values = c("lightsteelblue3", "navajowhite2"))+ guides(fill="none") +
  theme(axis.text.x=element_text(size=9)) + theme(strip.text.x = element_text(size = 12))

richness.plot.samples$layers <- richness.plot.samples$layers[-1] # remove the first layer of points 
richness.plot.samples
#####
#richness by time_point
alpha.plot.time <- plot_richness(PS.500, x="time_point", measures=c("Shannon")) + 
  labs(y= "Shannon Diversity", x = "Sampling Time Point") + theme_bw() + 
  geom_point(color="gray15", alpha=0.5) +
  coord_cartesian(ylim=c(0, 5))+
  geom_boxplot(fill= "gray85", alpha=0.5) + 
  theme(strip.text.x = element_text(size = 12))

alpha.plot.time$layers <- alpha.plot.time$layers[-1] # remove the first layer of points 
alpha.plot.time
#Shannon diversity by Lake
alpha.plot.location <- plot_richness(PS.500, x="lake", measures=c("Shannon")) +
  labs(y= "Shannon Diversity", x = "Lake") + theme_bw() + 
  geom_point(color="gray15", alpha=0.5) +
  coord_cartesian(ylim=c(0, 5))+
  geom_boxplot(fill= "gray15", alpha=0.5)+ 
  theme(strip.text.x = element_text(size = 12)) +
  theme(axis.text.x = element_text(angle = 20, vjust = 0.5, hjust=1))

alpha.plot.location$layers <- alpha.plot.location$layers[-1] # remove the first layer of points 
alpha.plot.location
# group the above two plots
alpha.time.lake<-plot_grid(alpha.plot.time, alpha.plot.location)

#####
#richness by organisms (zoop species and water)
#Shannon
richness.spp.shan <- plot_richness(PS.500, x="organism", color="organism", measures=c("Shannon")) +
  labs(y= "Alpha Diversity", x = "Zooplankton and Water Samples") + 
  coord_cartesian(ylim=c(0, 5))+
  geom_boxplot(aes(fill=organism), alpha=0.4) + guides(fill="none") + guides(color="none")+
  geom_point(alpha=0.5) +
  scale_fill_manual(values=c("coral2", "gold3", "darkolivegreen3", "darkslategray4", "dodgerblue")) +
  scale_color_manual(values=c("coral2", "gold3", "darkolivegreen3", "darkslategray4", "dodgerblue")) +
  theme(axis.text.x=element_blank(), strip.text.x = element_text(size = 12)) + theme_bw() +
  theme(axis.text.x = element_text(angle = 20, vjust = 0.5, hjust=1))

richness.spp.shan$layers <- richness.spp.shan$layers[-1] # remove the first layer of points 
richness.spp.shan
#richness by organisms (zoop species and water)
#observed
richness.spp.obs <- plot_richness(PS.500, x="organism", color="organism", measures=c("Observed")) +
  labs(y= "Alpha Diversity", x = "Zooplankton and Water Samples") + 
  coord_cartesian(ylim=c(0, 150))+
  geom_boxplot(aes(fill=organism), alpha=0.4) + guides(fill="none") + guides(color="none")+
  geom_point(alpha=0.5) +
  scale_fill_manual(values=c("coral2", "gold3", "darkolivegreen3", "darkslategray4", "dodgerblue")) +
  scale_color_manual(values=c("coral2", "gold3", "darkolivegreen3", "darkslategray4", "dodgerblue")) +
  theme(axis.text.x=element_blank(), strip.text.x = element_text(size = 12)) + theme_bw()+
   theme(axis.text.x = element_text(angle = 20, vjust = 0.5, hjust=1))

richness.spp.obs$layers <- richness.spp.obs$layers[-1] # remove the first layer of points 
richness.spp.obs
extract.legend.richness <- get_legend(
  # create some space to the left of the legend
  richness.spp.obs + theme(legend.box.margin = margin(0, 0, 0, 10)))

Use Shannon Diversity and Alpha richness to test differences across sampling factors - Use Shannon and Observed and test for time, lake, and organisms (species and or water) effects on alpha diversity

# some stats
Shan.rich<-estimate_richness(PS.500, measures =c("Observed", "Shannon"))
rownames(Shan.rich) <- substr(rownames(Shan.rich),2,nchar(rownames(Shan.rich))) # to remove the "X"
Shan.rich.df<-merge(data.frame(sample_data(PS.500)), Shan.rich, by = "row.names") # merge 

#### SHANNON
# straight 3 way no-interaction (bc of missing samples post filtering)
rich.shan.reduced<- lm(Shannon~ time_point+lake+organism, data=Shan.rich.df)
Anova(rich.shan.reduced, type=2) #lake and "organism" significant
#plot(allEffects(rich.shan.reduced))

#  posthoc just looking at Lake (p=<0.001)
rich.test.lake<-emmeans(rich.shan.reduced, pairwise~lake)
multcomp::cld(rich.test.lake, Letters=letters)
# lake          emmean    SE  df lower.CL upper.CL .group
# Serene          1.94 0.120 274     1.70     2.17  a    
# Eastern Brook   2.07 0.115 274     1.85     2.30  a   
# Virginia        2.47 0.108 274     2.26     2.69    b  
# Convict         2.57 0.121 274     2.33     2.81    b  
# Cooney          2.68 0.109 274     2.47     2.89    b  
# Blue            2.71 0.106 274     2.50     2.91    b 


rich.test.group<-emmeans(rich.shan.reduced, pairwise~organism)
multcomp::cld(rich.test.group, Letters=letters)

#organism   emmean     SE  df lower.CL upper.CL .group
#Calanoid     2.03 0.0648 270     1.90     2.16  a    
#Holopedium   2.12 0.2062 270     1.72     2.53  ab   
#Cyclopoid    2.19 0.0667 270     2.06     2.33   b   
#Daphnia      2.41 0.0565 270     2.30     2.52   b   
#Water        3.99 0.1021 270     3.79     4.19    c  


#### OBSERVED straight 3 way no-interaction (bc of missing samples post filtering)
rich.obs.all<- lm(Observed~ time_point+lake+organism, data=Shan.rich.df)
Anova(rich.obs.all, type=2) #lake and organism significant
plot(allEffects(rich.obs.all))
#  posthoc just looking at groups
obsrich.test.group<-emmeans(rich.obs.all, pairwise~organism)
multcomp::cld(obsrich.test.group, Letters=letters)

#organism   emmean     SE  df lower.CL upper.CL .group
# Holopedium   31.8 5.34 267     21.3     42.3  ab   
# Daphnia      33.3 1.47 267     30.4     36.2  a    
# Calanoid     37.0 1.68 267     33.7     40.3  ab   
# Cyclopoid    41.3 1.75 267     37.8     44.7   b   
# Water        98.5 2.64 267     93.3    103.7    c
  • compiled Shannon and Observed richness models, Table 2
Anova(rich.shan.reduced, type=2)
## Anova Table (Type II tests)
## 
## Response: Shannon
##            Sum Sq  Df F value    Pr(>F)    
## time_point  0.687   4  0.6039    0.6601    
## lake       18.559   5 13.0586 2.221e-11 ***
## organism   76.273   4 67.0835 < 2.2e-16 ***
## Residuals  75.894 267                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(rich.obs.all, type=2)
## Anova Table (Type II tests)
## 
## Response: Observed
##            Sum Sq  Df  F value    Pr(>F)    
## time_point   1246   4   1.5977    0.1752    
## lake         6868   5   7.0444 3.329e-06 ***
## organism    98071   4 125.7367 < 2.2e-16 ***
## Residuals   52063 267                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now testing effects of environmental variables on Shannon diversity. - first run a correlation matrix to see what variables show high correlation and should not be included in muliple analyses or variance or StepAIC model selection.

# testing effects of environmental variables on Shannon diversity
# full df = 'Shan.rich.df'

###### Compute correlation at 2 decimal places
# matrix for environmental conditions
Shan.rich.df.env<-Shan.rich.df %>% dplyr::select(latitude, elevation_m, temp_C, DOC, chla_ug.L, 
                                                 pH, DO_perc, cond, Shannon)
Shan.rich.df.env$elevation_m<-as.numeric(Shan.rich.df.env$elevation_m)

corr_matrix = round(cor(Shan.rich.df.env), 2)

# Compute and show the results
ggcorrplot(corr_matrix, hc.order = TRUE, type = "lower",
          lab = TRUE)
# strong negative correlation between elevation- cond and cond-chla (<0.69-0.73)
**Exploratory Figure**. Correlation matrix for environmental conditions and geography.

Exploratory Figure. Correlation matrix for environmental conditions and geography.

Shannon diversity according to sample type: Separate the sample types, as this is a strong predictor. Run separate analyses for water bacterioplankton and zooplankton (host-associated) microbial communities.

# separate by water or zooplankton
Shan.water.df<-Shan.rich.df[(Shan.rich.df$sample_type=="Water"),]
Shan.zoop.df<-Shan.rich.df[(Shan.rich.df$sample_type=="Zooplankton"),]

####### water first
Shan.water.df$elevation_m<-as.numeric(Shan.water.df$elevation_m)

# model 
a.div.mod.wat <- lm(Shannon ~ latitude + elevation_m + temp_C + log(DOC) + chla_ug.L + pH + DO_perc + cond, data = Shan.water.df)
a.div.AIC.wat <- MASS::stepAIC(a.div.mod.wat, direction = "both")

# best model with AIC -74.9 ====. Shannon ~ elevation_m + cond
shan.mod.wat<-lm(Shannon ~ elevation_m + cond, data = Shan.water.df)
summary(shan.mod.wat)
shan.aov.wat<-Anova(shan.mod.wat, type=2) # near significant for both 0.051 amnd 0.08

############ zooplankton alone
# run zooplankton
Shan.zoop.df$elevation_m<-as.numeric(Shan.zoop.df$elevation_m)

# model 
a.div.mod.zoop <- lm(Shannon ~ latitude + elevation_m + temp_C + log(DOC) + chla_ug.L + pH + DO_perc + cond, data = Shan.zoop.df)
a.div.AIC.zoop <- MASS::stepAIC(a.div.mod.zoop, direction = "both")

# best model with AIC -300.46 ==== latitude + log(DOC) + chla_ug.L + pH + DO_perc + cond
shan.mod.zoop<-lm(Shannon ~ latitude + log(DOC) + chla_ug.L + pH + DO_perc + cond, data = Shan.zoop.df)
summary(shan.mod.zoop)
shan.aov.zoop<-Anova(shan.mod.zoop, type=2)
plot(allEffects(shan.mod.zoop)) # shannon increases with elevation, DO, cond, log(DOC)
  • Best models for microbial Shannon diversity x Environment for water OR zooplankton are below.
  • These are found in Table S3..
# best models
shan.aov.wat
## Anova Table (Type II tests)
## 
## Response: Shannon
##              Sum Sq Df F value  Pr(>F)  
## elevation_m 0.26163  1  4.2053 0.05092 .
## cond        0.20037  1  3.2207 0.08482 .
## Residuals   1.55536 25                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
shan.aov.zoop
## Anova Table (Type II tests)
## 
## Response: Shannon
##           Sum Sq  Df F value   Pr(>F)    
## latitude   9.264   1 31.2184  6.1e-08 ***
## log(DOC)   1.823   1  6.1418 0.013874 *  
## chla_ug.L  0.764   1  2.5730 0.109986    
## pH         1.348   1  4.5423 0.034058 *  
## DO_perc    2.134   1  7.1913 0.007822 ** 
## cond       1.007   1  3.3927 0.066688 .  
## Residuals 73.002 246                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Best models for univariate tests of microbial Shannon diversity x Environment for water OR zooplankton are below.
  • These are found in Table S4.
########## univariate models for individual effects

## water:
# latitude, cond: NS
Anova(lm(Shannon ~ latitude, data = Shan.water.df)) # NS
## Anova Table (Type II tests)
## 
## Response: Shannon
##            Sum Sq Df F value Pr(>F)
## latitude  0.16703  1  2.6264 0.1172
## Residuals 1.65352 26
Anova(lm(Shannon ~ DO_perc, data = Shan.water.df)) # NS
## Anova Table (Type II tests)
## 
## Response: Shannon
##            Sum Sq Df F value Pr(>F)
## DO_perc   0.01955  1  0.2823 0.5997
## Residuals 1.80099 26
Anova(lm(Shannon ~ cond, data = Shan.water.df)) # NS
## Anova Table (Type II tests)
## 
## Response: Shannon
##            Sum Sq Df F value Pr(>F)
## cond      0.00356  1  0.0509 0.8233
## Residuals 1.81699 26

# summary By for median, mean, and standard error of the Shannon diversity by lake
lake_div.wat <- summaryBy(Shannon ~ lake, 
                          data = Shan.water.df, 
                          FUN = c(median, mean, sd))


## zooplankton:
# latitude signif
summary(lm(Shannon ~ latitude, data = Shan.zoop.df)) #R2=0.245
## 
## Call:
## lm(formula = Shannon ~ latitude, data = Shan.zoop.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.26037 -0.29846  0.03306  0.26442  1.82476 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -40.9667     4.7408  -8.641 6.62e-16 ***
## latitude      1.1416     0.1254   9.102  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5681 on 251 degrees of freedom
## Multiple R-squared:  0.2481, Adjusted R-squared:  0.2451 
## F-statistic: 82.84 on 1 and 251 DF,  p-value: < 2.2e-16
anova(lm(Shannon ~ latitude, data = Shan.zoop.df)) # p<0.001 alone
## Analysis of Variance Table
## 
## Response: Shannon
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## latitude    1 26.733 26.7332  82.838 < 2.2e-16 ***
## Residuals 251 81.002  0.3227                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#DO 
summary(lm(Shannon ~ DO_perc, data=Shan.zoop.df)) # adjust-R2=0.172
## 
## Call:
## lm(formula = Shannon ~ DO_perc, data = Shan.zoop.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0291 -0.3251  0.0501  0.4019  1.7145 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.19384    0.60094  -3.651 0.000318 ***
## DO_perc      0.05865    0.00804   7.294 3.93e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5951 on 251 degrees of freedom
## Multiple R-squared:  0.1749, Adjusted R-squared:  0.1716 
## F-statistic:  53.2 on 1 and 251 DF,  p-value: 3.928e-12
anova(lm(Shannon ~ DO_perc, data = Shan.zoop.df)) #p<0.001
## Analysis of Variance Table
## 
## Response: Shannon
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## DO_perc     1 18.843 18.8426  53.205 3.928e-12 ***
## Residuals 251 88.892  0.3542                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#DOC
summary(lm(Shannon ~ log(DOC), data=Shan.zoop.df)) # adjust-R2=0.004
## 
## Call:
## lm(formula = Shannon ~ log(DOC), data = Shan.zoop.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.95262 -0.45603  0.09154  0.46876  1.48279 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.13651    0.05122   41.72   <2e-16 ***
## log(DOC)     0.04215    0.02907    1.45    0.148    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6524 on 251 degrees of freedom
## Multiple R-squared:  0.008307,   Adjusted R-squared:  0.004356 
## F-statistic: 2.102 on 1 and 251 DF,  p-value: 0.1483
shan.DOC.zoop<-anova(lm(Shannon ~ log(DOC), data = Shan.zoop.df)) #p=0.148

#cond
summary(lm(Shannon ~ cond, data=Shan.zoop.df)) # adjust-R2=0.160
## 
## Call:
## lm(formula = Shannon ~ cond, data = Shan.zoop.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.07163 -0.37082  0.07253  0.36993  1.78337 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.729535   0.074694   23.16  < 2e-16 ***
## cond        0.005992   0.000856    7.00 2.34e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5993 on 251 degrees of freedom
## Multiple R-squared:  0.1633, Adjusted R-squared:   0.16 
## F-statistic: 48.99 on 1 and 251 DF,  p-value: 2.335e-11
shan.DOC.zoop<-anova(lm(Shannon ~ cond, data = Shan.zoop.df)) #p=0.148

# summary By for median, mean, and standard error of the Shannon diversity by lake
lake_div.zoop <- summaryBy(Shannon ~ lake, 
                          data = Shan.zoop.df, 
                          FUN = c(median, mean, sd))
  • run code to generate plot of Microbial Shannon diversity.
  • these will be groups and displayed later with relative abundance plots in Figure 5.
### plots
lat.shan.plot.zoop<-ggplot(Shan.rich.df, aes(x=latitude, y= Shannon, color=sample_type, fill=sample_type)) +
  geom_smooth(method = lm, alpha=0.2)+
  geom_point(size=1.5, alpha=0.6)+
  scale_color_manual(values = c("dodgerblue", "coral"))+
  scale_fill_manual(values = c("dodgerblue3", "firebrick"))+
  ylab("Shannon diversity")+
  xlab("Latitude (degrees N)")+
  theme_classic() +
  annotate(geom="text", x=37.7, y=0.1, label="Zoop. Adjusted-R2=0.245, p<0.001", size=3,
           color="black")+
  annotate(geom="text", x=37.7, y=4.2, label="Water NS", size=3,
           color="black")

DO.shan.plot.zoop<-ggplot(Shan.rich.df, aes(x=DO_perc, y= Shannon, color=sample_type, fill=sample_type)) +
  geom_smooth(method = lm, alpha=0.2)+
  geom_point(size=1.5, alpha=0.6)+
  scale_color_manual(values = c("dodgerblue", "coral"))+
  scale_fill_manual(values = c("dodgerblue3", "firebrick"))+
  ylab("Shannon diversity")+
  xlab("Dissolved Oxygen (%)")+
  theme_classic() +
  annotate(geom="text", x=80, y=0.1, label="Zoop. Adjusted-R2=0.172, p<0.001", size=3,
           color="black") +
annotate(geom="text", x=70, y=4.1, label="Water NS", size=3,
           color="black")

cond.shan.plot.zoop<-ggplot(Shan.rich.df, aes(x=cond, y=Shannon, color=sample_type, fill=sample_type)) +
  geom_smooth(method = lm, alpha=0.2)+
  geom_point(size=1.5, alpha=0.6)+
  scale_color_manual(values = c("dodgerblue", "coral"))+
  scale_fill_manual(values = c("dodgerblue3", "firebrick"))+
  ylab("Shannon diversity")+
  xlab("Conductivity (uS/cm)") +
  theme_classic() +
  annotate(geom="text", x=125, y=0.1, label="Zoop. Adjusted-R2=0.160, p<0.001", size=3,
           color="black")+
  annotate(geom="text", x=125, y=4.1, label="Water NS", size=3,
           color="black")

DOC.shan.plot.zoop<-ggplot(Shan.rich.df, aes(x=log(DOC), y= Shannon, color=sample_type, fill=sample_type)) +
  geom_smooth(method = lm, alpha=0.2)+
  geom_point(size=1.5, alpha=0.6)+
  scale_color_manual(values = c("dodgerblue", "coral"))+
  scale_fill_manual(values = c("dodgerblue3", "firebrick"))+
  ylab("Shannon diversity")+
  xlab("log(Dissolved organic carbon) (mg/L)")+
  theme_classic() +
  annotate(geom="text", x=3, y=0.1, label="Water and Zoop NS", size=3,
           color="black")

## get legend
extract.legend.Shan <- get_legend(
  # create some space to the left of the legend
  cond.shan.plot.zoop + theme(legend.box.margin = margin(0, 0, 0, 10)))
 
multregres.plots.shannon.env.wat.zoop<-plot_grid(lat.shan.plot.zoop + guides(color="none", fill="none"),
                                             DO.shan.plot.zoop + guides(color="none", fill="none"),
                                             cond.shan.plot.zoop + guides(color="none", fill="none"),
                                             extract.legend.Shan,
                            ncol=3, nrow=1, labels=c("A", "B", "C", ""), rel_widths=c(5,5,5,2))

# summary By for median, mean, and standard error of the Shannon diversity by lake
lake_div <- summaryBy(Shannon ~ lake, data = Shan.rich.df, FUN = c(median, mean, sd))

For ordination use PCoA [rarifed data, db-Bray Curtis], compare with unweighted unifrac [rarified with phylogenetic info] - perform PCoA on all samples - perform PCoA on rarified 500 reads - perform PCoA on rarified-hellinger transformed.


###### check the # of samples in each taxa
table(sample_data(PS.500)$organism)
## 
##    Daphnia Holopedium   Calanoid  Cyclopoid      Water 
##         98          8         74         73         28
table(sample_data(PS.500)$sample_type) 
## 
##       Water Zooplankton 
##          28         253
#284 samples: 28 water, 253 zooplankton 

# rarefied non-transformed
set.seed(341)
ord.500<- ordinate(PS.500, method = 'PCoA', distance ="bray")

#########
#PCoA Bray
PCoA.rar.bray<- plot_ordination(
  physeq=PS.500, 
  ordination = ord.500) +
  geom_point(aes(color = organism, shape=sample_type), size = 1.5) +
  stat_ellipse(level=0.95, linetype = 2, aes(color=organism)) +
  scale_color_manual(values=c("coral2", "goldenrod2", "darkolivegreen3", "darkslategray4", "dodgerblue")) +
  scale_shape_manual(values= c(1,16))+
  ggtitle("Bray-Curtis") +
  theme_classic()
PCoA.rar.bray$layers <- PCoA.rar.bray$layers[-1]

library("colorBlindness")
#cvdPlot(PCoA.rar.bray) # see that the pallete is colorblind friendly


####### 
# make phylogenetic tree and add to phyloseq if wanting to do unifrac weight/unweighted
set.seed(341)
random_tree = rtree(ntaxa(PS.500), rooted=TRUE, tip.label=taxa_names(PS.500))

# add in, use unweighted since we have low abundance taxa
phy_tree(PS.500)<-random_tree

set.seed(341)
ord.unif <- ordinate(PS.500, method = 'PCoA', distance ="unifrac") # unweighted

PCoA.rar.unw.unif<- plot_ordination(
  physeq=PS.500, 
  ordination = ord.unif) +
  geom_point(aes(color = organism, shape=sample_type), size = 1.5) +    
  stat_ellipse(level=0.95, linetype = 2, aes(color=organism)) +
  scale_color_manual(values=c("coral2", "goldenrod2", "darkolivegreen3", "darkslategray4", "dodgerblue")) +
  scale_shape_manual(values= c(1,16))+
  ggtitle("Unweighted Unifrac") +
  theme_classic() 
PCoA.rar.unw.unif$layers <- PCoA.rar.unw.unif$layers[-1]


##### for comparison , see weighted unifrac
set.seed(341)
ord.Wunif <- ordinate(PS.500, method = 'PCoA', distance ="wunifrac")

PCoA.rar.W.unif<- plot_ordination(
  physeq=PS.500, 
  ordination = ord.Wunif) +
  geom_point(aes(color = organism, shape=sample_type), size = 1.5) +    
  stat_ellipse(level=0.95, linetype = 2, aes(color=organism)) +
  scale_color_manual(values=c("coral2", "goldenrod2", "darkolivegreen3", "darkslategray4", "dodgerblue")) +
  scale_shape_manual(values= c(1,16))+
  ggtitle("Weighted Unifrac") +
  theme_classic() 
PCoA.rar.W.unif$layers <- PCoA.rar.W.unif$layers[-1]


#########
extract.legend.PCoA <- get_legend(
  # create some space to the left of the legend
  PCoA.rar.W.unif + theme(legend.box.margin = margin(0, 0, 0, 10)))

# make combined plot
PCoA.and.Unifracs<-plot_grid(PCoA.rar.bray +  guides(color="none", shape="none"),
                            PCoA.rar.W.unif + guides(color="none",  shape="none"), 
                            PCoA.rar.unw.unif +  guides(color="none",  shape="none"), extract.legend.PCoA, 
                            ncol=4, rel_widths=c(5,5,5,2), labels = c("A","B", "C"))

PCoA.and.Unifracs
dev.copy(pdf, "figures/FigS5.PCoA.Unifrac.pdf", height=4, width=12)
## pdf 
##   3
dev.off()
## quartz_off_screen 
##                 2
**Figure S5**. Analysis of zooplankton-associated and free-living (water) bacteria community beta diversity using (A) principal coordinate analysis (PCoA) of Bray-Curtis dissimilarity and (B) weighted and (C) unweighted unifrac distance.

Figure S5. Analysis of zooplankton-associated and free-living (water) bacteria community beta diversity using (A) principal coordinate analysis (PCoA) of Bray-Curtis dissimilarity and (B) weighted and (C) unweighted unifrac distance.

Run PERMANOVA on the dissimilarity. The code here is meant to give PERMANOVA tests of main effects, tests of environmental effects in Permanova, and tests of betadispersion for different groups. - first, we will run analysis for all samples - then for zooplankton alone - then for water alone - output from models here will be in *Table S

####################################
# first let's look at PERMANOVA and NDMS for the full dataset
# rarefied
set.seed(1000)
bc_500 <- phyloseq::distance(PS.500, method = "bray") # rarefied

# run NMDS on Bray distance object (defined above for 'bc_500'), this is
nmds.rar <- metaMDS(bc_500,
          distance = "bray",
          k = 3,
          maxit = 999, 
          wascores = TRUE)


# get NMDS 1-3, we will use this later in NMDS plots
scores.full <- as.data.frame(scores(nmds.rar, display = "sites"))
scores.1.2<-scores.full[-3] # subset to just give the 2 axis space for NMDs1-2

nmds.rar.df <- merge(sample_data(PS.500), scores.1.2, by = 'row.names', all = TRUE)
  • calculate distance to centroids from NMDS 1 and 2
######
# inspect beta diversity and distance to centroid a bit: (zoop species and water)
# using the rarefied here since beta diversity metric

######## let's use  approach for NMDS only
# calculate mean distances to centroid manually (use 'grp.median' if using not using centroid in betadisper)
groups<-nmds.rar.df$organism

df=NULL
for(gr in unique(groups)) {
  organism=print(gr)
  grp.mean = colMeans(scores.1.2[groups == gr,])
  # calculate the euclidean distance between centroid point and all other points
  dist.to.centr = apply(scores.1.2[groups == gr,], 1, function(xx) {sqrt(sum((xx - grp.mean)^2))})
  info = print(paste0('all axes mean dist to centroid: ', round(mean(dist.to.centr),4) )) # get the mean of all distances
  df = rbind(df, data.frame(organism, dist.to.centr))
}

bdisp.df<-df
bdisp.df$organism<-as.factor(bdisp.df$organism)
bdisp.df$organism<-factor(bdisp.df$organism, levels =c("Daphnia", "Holopedium", 
                                   "Calanoid", "Cyclopoid", "Water"))
bdisp.df$dist.to.centr<-as.numeric(bdisp.df$dist.to.centr)

### box plot for distance
beta.dist.box<- ggplot(data=bdisp.df, aes(x=organism, y=dist.to.centr, color=organism)) +
  geom_boxplot(aes(fill=organism), alpha=0.4) + guides(fill="none") +
  geom_point(alpha=0.5) + 
  ylim(0,1) +
  guides(color=guide_legend(title=""))+ xlab("Sample Type")+ ylab("Distance to centroid") +
  ggtitle("Beta diversity") +
  scale_fill_manual(values=c("coral2", "gold3", "darkolivegreen3", "darkslategray4", "dodgerblue")) +
  scale_color_manual(values=c("coral2", "gold3", "darkolivegreen3", "darkslategray4", "dodgerblue")) +
  theme(axis.text.x=element_blank()) + theme_bw()+
   theme(axis.text.x = element_text(angle = 20, vjust = 0.5, hjust=1))


# test of distance to centroids
beta.mod<-lm(dist.to.centr ~ organism, data = bdisp.df)
posthoc.betad<-emmeans(beta.mod, pairwise~organism)
multcomp::cld(posthoc.betad, Letters=letters)

#organism   emmean     SE  df lower.CL upper.CL .group
# Water       0.161 0.0603 276   0.0426    0.280  a    
# Daphnia     0.352 0.0322 276   0.2890    0.416   b   
# Cyclopoid   0.474 0.0373 276   0.4008    0.548   b   
# Holopedium  0.475 0.1127 276   0.2529    0.697  abc  
# Calanoid    0.707 0.0371 276   0.6336    0.780    c 
 
# notused, but just in case...
kruskal_test(dist.to.centr ~ organism, data = bdisp.df)
dunn_test(dist.to.centr ~ organism, data = bdisp.df)
  • combine the plots
  • multi-panel alpha diversity plot from above with beta diversity distance to centroid
# combine with richness
alpha.beta.rich.plot<-plot_grid(alpha.time.lake, richness.spp.shan, 
                           richness.spp.obs + guides(color="none", fill="none"), 
                           beta.dist.box + guides(color="none", fill="none"),
                      ncol=4, rel_widths=c(12,6,6,6))
alpha.beta.rich.plot
**Fig 4.** Shannon diversity of bacterial communities across (A) sampling time points, (B) lakes, and (C) among zooplankton and water samples. (D) Observed ASV richness by sample type (zooplankton and water). (E) Beta diversity as average distance to group centroids for two axes of non-metric multidimensional scaling ordination (NMDS1-2).

Fig 4. Shannon diversity of bacterial communities across (A) sampling time points, (B) lakes, and (C) among zooplankton and water samples. (D) Observed ASV richness by sample type (zooplankton and water). (E) Beta diversity as average distance to group centroids for two axes of non-metric multidimensional scaling ordination (NMDS1-2).

dev.copy(pdf, "figures/Fig4.alpha.beta.rich.plot.pdf", height=6, width=16)
dev.off()

PERMANOVA

Run PERMANOVA for beta diversity, testing for effects of Time, Lake, Organism. Run PERMANOVA for Bray-Curtis, Unifrac, and Weighted Unifrac.
- This is Table 3.

###############
# Permanova with 3 methods
# rarefied
bc_500 <- phyloseq::distance(PS.500, method = "bray") # rarefied
unif_500 <- phyloseq::distance(PS.500, method = "unifrac") # rarefied
wunif_500 <- phyloseq::distance(PS.500, method = "wunifrac") # rarefied

# run PERMANOVA with Time, Lake, Organisms
# Bray-Curtis from phyloseq: bc_500
set.seed(781)
broad.model.zoop <- adonis2(bc_500 ~ time_point*lake*organism, 
                            data = nmds.rar.df, method="bray", permutations = 999, by="terms")
broad.model.zoop

#test for beta dispersion
anova(betadisper(bc_500, nmds.rar.df$lake)) #  not-signif (barely)
anova(betadisper(bc_500, nmds.rar.df$time_point)) #  significant *barely
anova(betadisper(bc_500, nmds.rar.df$organism)) # very significant

pairwise.adonis(bc_500, factors=nmds.rar.df$time_point, p.adjust.m="holm")
pairwise.adonis(bc_500, factors=nmds.rar.df$lake, p.adjust.m="holm")
pairwise.adonis(bc_500, factors=nmds.rar.df$organism, p.adjust.m="holm")


##### 
### WEIGHTED Unifrac from phyloseq: wunif_500
set.seed(71)
broad.model.zoop.wunif <- adonis2(wunif_500 ~ time_point*lake*organism, 
                            data = nmds.rar.df, method="wunifrac", permutations = 999, by="terms")
broad.model.zoop.wunif

#test for beta dispersion
anova(betadisper(wunif_500, nmds.rar.df$lake)) #  not-signif (barely)
anova(betadisper(wunif_500, nmds.rar.df$time_point)) #  significant *barely
anova(betadisper(wunif_500, nmds.rar.df$organism)) # very significant

pairwise.adonis(wunif_500, factors=nmds.rar.df$time_point, p.adjust.m="holm")
pairwise.adonis(wunif_500, factors=nmds.rar.df$lake, p.adjust.m="holm")
pairwise.adonis(wunif_500, factors=nmds.rar.df$organism, p.adjust.m="holm")


##### 
### UNWEIGHTED Unifrac from phyloseq: unif_500
set.seed(71)
broad.model.zoop.UNWunif <- adonis2(unif_500 ~ time_point*lake*organism, 
                            data = nmds.rar.df, method="unifrac", permutations = 999, by="terms")
broad.model.zoop.UNWunif

#test for beta dispersion
anova(betadisper(unif_500, nmds.rar.df$lake)) #  not-signif (barely)
anova(betadisper(unif_500, nmds.rar.df$time_point)) #  significant *barely
anova(betadisper(unif_500, nmds.rar.df$organism)) # very significant

pairwise.adonis(unif_500, factors=nmds.rar.df$time_point, p.adjust.m="holm")
pairwise.adonis(unif_500, factors=nmds.rar.df$lake, p.adjust.m="holm")
  • outputs of PERMANOVA as: Bray-Curtis, Weighted Unifrac, Unweighted Unifrac
broad.model.zoop
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = bc_500 ~ time_point * lake * organism, data = nmds.rar.df, permutations = 999, method = "bray", by = "terms")
##                           Df SumOfSqs      R2        F Pr(>F)    
## time_point                 4    7.163 0.06109  26.5221  0.001 ***
## lake                       5   15.527 0.13242  45.9902  0.001 ***
## organism                   4   32.520 0.27735 120.4040  0.001 ***
## time_point:lake           20   13.550 0.11556  10.0337  0.001 ***
## time_point:organism       14    7.086 0.06043   7.4956  0.001 ***
## lake:organism             15   19.193 0.16369  18.9498  0.001 ***
## time_point:lake:organism  39   10.129 0.08638   3.8462  0.001 ***
## Residual                 179   12.087 0.10308                    
## Total                    280  117.256 1.00000                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
broad.model.zoop.wunif
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = wunif_500 ~ time_point * lake * organism, data = nmds.rar.df, permutations = 999, method = "wunifrac", by = "terms")
##                           Df SumOfSqs      R2        F Pr(>F)    
## time_point                 4    4.657 0.06941  28.1627  0.001 ***
## lake                       5    8.915 0.13287  43.1288  0.001 ***
## organism                   4   18.264 0.27220 110.4458  0.001 ***
## time_point:lake           20    7.072 0.10540   8.5530  0.001 ***
## time_point:organism       14    4.242 0.06323   7.3299  0.001 ***
## lake:organism             15   10.569 0.15752  17.0436  0.001 ***
## time_point:lake:organism  39    5.978 0.08909   3.7075  0.001 ***
## Residual                 179    7.400 0.11029                    
## Total                    280   67.098 1.00000                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
broad.model.zoop.UNWunif
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = unif_500 ~ time_point * lake * organism, data = nmds.rar.df, permutations = 999, method = "unifrac", by = "terms")
##                           Df SumOfSqs      R2       F Pr(>F)    
## time_point                 4    4.247 0.05249  7.6855  0.001 ***
## lake                       5    9.524 0.11771 13.7875  0.001 ***
## organism                   4   13.612 0.16824 24.6318  0.001 ***
## time_point:lake           20    8.597 0.10625  3.1113  0.001 ***
## time_point:organism       14    3.807 0.04705  1.9681  0.001 ***
## lake:organism             15    9.021 0.11150  4.3533  0.001 ***
## time_point:lake:organism  39    7.371 0.09110  1.3680  0.001 ***
## Residual                 179   24.730 0.30565                   
## Total                    280   80.910 1.00000                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

NMDS and envfit: By-group

Run NMDS with envfit for water, daphnia, calanoids, cyclopoids to better understand how environment is impacting these host-microbiomes. We will run NMDS and fit env.fit, perform PERMANOVA, test betadispersion, and test env conditions on BC matrix. Collectively, these outputs make Table 4.

  • First, run Daphnia
  • Holopedium too few to be informative here
# run same analysis but subset to only be cladocerans
# sample data and OTU table for vegan
# subset
daph.phylo<-subset_samples(PS.500, organism=="Daphnia")
daph.phylo <- prune_taxa(taxa_sums(daph.phylo) >0, daph.phylo) # remove those asvs with 0 sum

####################################
# first let's look at PERMANOVA for the full dataset
# rarefied
set.seed(1000)
bc_daph <- phyloseq::distance(daph.phylo, method = "bray") # rarefied

# run NMDS on Bray distance object (defined above for 'bc_500.hell'), this is
set.seed(520)
nmds.daph.rar <- metaMDS(bc_daph,
          distance = "bray",
          k = 3,
          maxit = 999, 
          wascores = TRUE)

# get NMDS 1-3
sp.daph.scores <- as.data.frame(scores(nmds.daph.rar, display = "sites"))
nmds.daph.rar.df <- merge(sample_data(daph.phylo), sp.daph.scores, by = 'row.names', all = TRUE)

#### pick envr. data and run the envfit function
env.daph<- nmds.daph.rar.df %>%
  dplyr::select(latitude, longitude, elevation_m, temp_C, chla_ug.L, pH, DO_perc, cond, DOC, TDN, TDP)

set.seed(123)
vf.daph <- envfit(nmds.daph.rar, env.daph, perm = 999, na.rm = TRUE) 
# lots of significance, except elevation, pH, DOC, TDN,    highest significance lat/long, spc, cond

### grab vectors and put into df
spp.daph.scrs <- as.data.frame(scores(vf.daph, display = "vectors")) *ordiArrowMul(vf.daph)
spp.daph.scrs.test <- cbind(spp.daph.scrs, r=vf.daph$vectors[2], pvals=vf.daph$vectors[4], 
                       Envfact = rownames(spp.daph.scrs))
names(spp.daph.scrs.test)[names(spp.daph.scrs.test) == 'r'] <- 'r2' # rename bc vector has naming issues

# subset to keep only those Envfactors that are significant 
spp.daph.scrs.signif<-spp.daph.scrs.test[spp.daph.scrs.test$pvals< 0.05, ]

# see color hexadecimal
# Hexadecimal color specification 
brewer.pal(n = 6, name = "GnBu")
#scale_color_brewer(palette="GnBu", n=6, direction=-1) + if want to use the palette and reverse order

# inspect
plot(nmds.daph.rar)
plot(vf.daph)

# make the plot
NMDS.daph <- ggplot(nmds.daph.rar.df) +
  geom_point(mapping = aes(x = NMDS1, y = NMDS2, colour = lake), size= 2) +
  #coord_fixed(ratio=1.6) + 
  xlim(-2.2,1.7)+
  ylim(-1.5,1.5)+
  stat_ellipse(level=0.95, linetype = 2, linewidth=0.6, aes(x = NMDS1, y = NMDS2, color=lake)) +
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  ggtitle("Daphnia")+
  geom_segment(data = spp.daph.scrs.signif,
               aes(x = 0, xend = NMDS1, y = 0, yend = NMDS2),
               arrow = arrow(length = unit(0.25, "cm")), colour = "goldenrod4") +
  geom_text(data = spp.daph.scrs.signif, aes(x = NMDS1, y = NMDS2, label = Envfact), color="goldenrod4",
            size = 2.5)+
  theme_classic() +
  annotate(geom="text", x=1, y=-1.5, label="Stress=0.104", size=3,
           color="black")

### permanova and betadisp
# run PERMANOVA
set.seed(781)
broad.model.daph <- adonis2(bc_daph ~ time_point * lake, 
                            data = nmds.daph.rar.df, permutations = 999, by="terms")
broad.model.daph # signif lake and time effect, driven by differences in time 2

pairwise.adonis(bc_daph, factors=nmds.daph.rar.df$lake, p.adjust.m="holm")

# Env model
env.model.daph <- adonis2(bc_daph ~ latitude + elevation_m + temp_C + pH + cond + 
                            DOC + chla_ug.L + DO_perc, data = nmds.daph.rar.df, permutations = 999, by="terms")
env.model.daph # latitude is best (0.12), elevation (0.07), cond (0.07)

# betadispersion
anova(betadisper(bc_daph, nmds.daph.rar.df$time)) # not signf, (p=0.234)
anova(betadisper(bc_daph, nmds.daph.rar.df$lake)) # not significant, (p=0.526)

# alternative means of finding best model
daph.rar.env<- nmds.daph.rar.df %>% dplyr::select(latitude, elevation_m, temp_C, pH, cond, DOC, chla_ug.L, DO_perc)
bioenv(bc_daph, daph.rar.env, scale.var = TRUE)
# best model ph and conductivity = 0.49
  • Now run NMDS-envfit for calanoids
# subset
calan.phylo<-subset_samples(PS.500, organism=="Calanoid")
calan.phylo <- prune_taxa(taxa_sums(calan.phylo) >0, calan.phylo) # remove those asvs with 0 sum

####################################
# first let's look at PERMANOVA for the full dataset
# rarefied
set.seed(1000)
bc_calan <- phyloseq::distance(calan.phylo, method = "bray") # rarefied

# run NMDS on Bray distance object (defined above for 'bc_500')
set.seed(520)
nmds.calan.rar <- metaMDS(bc_calan,
          distance = "bray",
          k = 3,
          maxit = 999, 
          wascores = TRUE)
## Run 0 stress 0.1169913 
## Run 1 stress 0.1169871 
## ... New best solution
## ... Procrustes: rmse 0.0008545327  max resid 0.006062541 
## ... Similar to previous best
## Run 2 stress 0.1287203 
## Run 3 stress 0.128719 
## Run 4 stress 0.1169916 
## ... Procrustes: rmse 0.0009001259  max resid 0.006126756 
## ... Similar to previous best
## Run 5 stress 0.1169914 
## ... Procrustes: rmse 0.0008826625  max resid 0.006215649 
## ... Similar to previous best
## Run 6 stress 0.1169913 
## ... Procrustes: rmse 0.0008348449  max resid 0.005924063 
## ... Similar to previous best
## Run 7 stress 0.1287192 
## Run 8 stress 0.1169914 
## ... Procrustes: rmse 0.0008871101  max resid 0.006118204 
## ... Similar to previous best
## Run 9 stress 0.1169916 
## ... Procrustes: rmse 0.0008985489  max resid 0.006117903 
## ... Similar to previous best
## Run 10 stress 0.1169915 
## ... Procrustes: rmse 0.0008884233  max resid 0.006120264 
## ... Similar to previous best
## Run 11 stress 0.1169913 
## ... Procrustes: rmse 0.0008545407  max resid 0.006088684 
## ... Similar to previous best
## Run 12 stress 0.1169913 
## ... Procrustes: rmse 0.0008553331  max resid 0.006061379 
## ... Similar to previous best
## Run 13 stress 0.116987 
## ... New best solution
## ... Procrustes: rmse 9.133958e-05  max resid 0.0006307062 
## ... Similar to previous best
## Run 14 stress 0.1282688 
## Run 15 stress 0.116987 
## ... New best solution
## ... Procrustes: rmse 2.283095e-05  max resid 7.726986e-05 
## ... Similar to previous best
## Run 16 stress 0.1287199 
## Run 17 stress 0.116987 
## ... New best solution
## ... Procrustes: rmse 2.471928e-05  max resid 7.575641e-05 
## ... Similar to previous best
## Run 18 stress 0.1169914 
## ... Procrustes: rmse 0.0008487925  max resid 0.006095435 
## ... Similar to previous best
## Run 19 stress 0.1169871 
## ... Procrustes: rmse 8.91657e-05  max resid 0.0002864968 
## ... Similar to previous best
## Run 20 stress 0.116987 
## ... Procrustes: rmse 4.461724e-05  max resid 0.0002588049 
## ... Similar to previous best
## *** Best solution repeated 4 times

# get NMDS 1-3
sp.calan.scores <- as.data.frame(scores(nmds.calan.rar, display = "sites"))
nmds.calan.rar.df <- merge(sample_data(calan.phylo), sp.calan.scores, by = 'row.names', all = TRUE)


#### pick envr. data and run the envfit function
env.calan<- nmds.calan.rar.df %>%
  dplyr::select(latitude, longitude, elevation_m, temp_C, chla_ug.L, pH, DO_perc, cond, DOC, TDN, TDP)

set.seed(123)
vf.calan <- envfit(nmds.calan.rar, env.calan, perm = 999, na.rm = TRUE) 
# lots of significance, except elevation, pH, DOC, TDN,    highest significance lat/long, spc, cond

# inspect
plot(nmds.calan.rar)
plot(vf.calan)

### grab vectors and put into df
spp.calan.scrs <- as.data.frame(scores(vf.calan, display = "vectors")) *ordiArrowMul(vf.calan)
spp.calan.scrs.test <- cbind(spp.calan.scrs, r=vf.calan$vectors[2], pvals=vf.calan$vectors[4], 
                       Envfact = rownames(spp.calan.scrs))
names(spp.calan.scrs.test)[names(spp.calan.scrs.test) == 'r'] <- 'r2' # rename bc vector has naming issues

# subset to keep only those Envfactors that are significant 
spp.calan.scrs.signif<-spp.calan.scrs.test[spp.calan.scrs.test$pvals< 0.05, ]

# make the plot
NMDS.calan <- ggplot(nmds.calan.rar.df) +
  geom_point(mapping = aes(x = NMDS1, y = NMDS2, colour = lake), size=2) +
  #coord_fixed(ratio=0.8) +
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  stat_ellipse(level=0.95, linetype = 2, linewidth=0.6, aes(x = NMDS1, y = NMDS2, color=lake)) +
  ggtitle("Calanoids")+
  geom_segment(data = spp.calan.scrs.signif,
               aes(x = 0, xend = NMDS1, y = 0, yend = NMDS2),
               arrow = arrow(length = unit(0.25, "cm")), colour = "goldenrod4") +
  geom_text(data = spp.calan.scrs.signif, 
            aes(x = NMDS1, y = NMDS2, label = Envfact), color="goldenrod4",
            size = 2.5)+
  theme_classic() +
  annotate(geom="text", x=1, y=-4, label="Stress=0.117", size=3,
           color="black")

### permanova and betadisp
# run PERMANOVA
set.seed(781)
broad.model.calan <- adonis2(bc_calan ~ time_point * lake, 
                            data = nmds.calan.rar.df, permutations = 999, by="terms")
broad.model.calan # signif lake and time effect, driven by differences in time 1 and 5 (whcih are similar) 
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = bc_calan ~ time_point * lake, data = nmds.calan.rar.df, permutations = 999, by = "terms")
##                 Df SumOfSqs      R2       F Pr(>F)    
## time_point       4   4.2517 0.15935 12.4993  0.001 ***
## lake             5  12.1420 0.45507 28.5560  0.001 ***
## time_point:lake 12   5.8656 0.21984  5.7479  0.001 ***
## Residual        52   4.4221 0.16574                   
## Total           73  26.6813 1.00000                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

pairwise.adonis(bc_calan, factors=nmds.calan.rar.df$lake, p.adjust.m="holm")
##                        pairs Df SumsOfSqs   F.Model        R2 p.value
## 1        Convict vs Virginia  1 2.3327008 10.728341 0.3608792   0.001
## 2          Convict vs Serene  1 4.9549790 24.775083 0.4215236   0.001
## 3   Convict vs Eastern Brook  1 4.0811247 22.509590 0.4640235   0.001
## 4          Convict vs Cooney  1 3.5235024 15.964676 0.3994697   0.001
## 5            Convict vs Blue  1 3.8454564 26.483432 0.5577405   0.001
## 6         Virginia vs Serene  1 1.8118089  7.934177 0.2409101   0.001
## 7  Virginia vs Eastern Brook  1 1.4129636  6.630184 0.2805811   0.001
## 8         Virginia vs Cooney  1 0.7707561  2.748856 0.1548751   0.015
## 9           Virginia vs Blue  1 1.1060710  6.778173 0.3609602   0.002
## 10   Serene vs Eastern Brook  1 0.7008517  3.564637 0.1002298   0.005
## 11          Serene vs Cooney  1 1.9881965  8.676289 0.2243310   0.001
## 12            Serene vs Blue  1 3.4423731 20.087820 0.4266033   0.001
## 13   Eastern Brook vs Cooney  1 1.5752009  7.236867 0.2475254   0.001
## 14     Eastern Brook vs Blue  1 3.2025702 23.947788 0.5576024   0.001
## 15            Cooney vs Blue  1 2.8197042 15.344657 0.4744109   0.001
##    p.adjusted sig
## 1       0.015   .
## 2       0.015   .
## 3       0.015   .
## 4       0.015   .
## 5       0.015   .
## 6       0.015   .
## 7       0.015   .
## 8       0.015   .
## 9       0.015   .
## 10      0.015   .
## 11      0.015   .
## 12      0.015   .
## 13      0.015   .
## 14      0.015   .
## 15      0.015   .

# Env model
env.model.calan <- adonis2(bc_calan ~ latitude + elevation_m + temp_C + pH + cond + 
                            DOC + chla_ug.L + DO_perc, data = nmds.calan.rar.df, permutations = 999)
env.model.calan # elevation (0.22), latitude is best (0.13)
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = bc_calan ~ latitude + elevation_m + temp_C + pH + cond + DOC + chla_ug.L + DO_perc, data = nmds.calan.rar.df, permutations = 999)
##          Df SumOfSqs      R2      F Pr(>F)    
## Model     8   13.537 0.50735 8.3673  0.001 ***
## Residual 65   13.145 0.49265                  
## Total    73   26.681 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# betadispersion
anova(betadisper(bc_calan, nmds.calan.rar.df$time)) # NS
## Analysis of Variance Table
## 
## Response: Distances
##           Df  Sum Sq   Mean Sq F value Pr(>F)
## Groups     4 0.00501 0.0012535  0.0638 0.9923
## Residuals 69 1.35551 0.0196450
anova(betadisper(bc_calan, nmds.calan.rar.df$lake)) # significant (p=0.004))
## Analysis of Variance Table
## 
## Response: Distances
##           Df  Sum Sq  Mean Sq F value   Pr(>F)   
## Groups     5 0.43501 0.087002  3.8364 0.004034 **
## Residuals 68 1.54213 0.022678                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# alternative means of finding best model
calan.rar.env<- nmds.calan.rar.df %>% dplyr::select(latitude, elevation_m, temp_C, pH, cond, DOC, chla_ug.L, DO_perc)
bioenv(bc_calan, calan.rar.env, scale.var = TRUE)
## 
## Call:
## bioenv(comm = bc_calan, env = calan.rar.env, scale.var = TRUE) 
## 
## Subset of environmental variables with best correlation to community data.
## 
## Correlations:    spearman 
## Dissimilarities: bray 
## Metric:          euclidean 
## 
## Best model has 2 parameters (max. 8 allowed):
## latitude elevation_m
## with correlation  0.739184
# best model latitude, elevation_m = 0.74
  • Now run NMDS-envfit for cyclopoids
# subset
cyclop.phylo<-subset_samples(PS.500, organism=="Cyclopoid")
cyclop.phylo <- prune_taxa(taxa_sums(cyclop.phylo) >0, cyclop.phylo) # remove those asvs with 0 sum

####################################
# first let's look at PERMANOVA for the full dataset
# rarefied
set.seed(1000)
bc_cyclop <- phyloseq::distance(cyclop.phylo, method = "bray") # rarefied

# run NMDS on Bray distance object
set.seed(520)
nmds.cyclop.rar <- metaMDS(bc_cyclop,
          distance = "bray",
          k = 3,
          maxit = 999, 
          wascores = TRUE)
## Run 0 stress 0.1289183 
## Run 1 stress 0.1289173 
## ... New best solution
## ... Procrustes: rmse 0.0002791642  max resid 0.001415977 
## ... Similar to previous best
## Run 2 stress 0.1289169 
## ... New best solution
## ... Procrustes: rmse 0.0001900924  max resid 0.0009576245 
## ... Similar to previous best
## Run 3 stress 0.1289177 
## ... Procrustes: rmse 0.0003341352  max resid 0.001677328 
## ... Similar to previous best
## Run 4 stress 0.1289173 
## ... Procrustes: rmse 0.0001675686  max resid 0.0008438084 
## ... Similar to previous best
## Run 5 stress 0.1289176 
## ... Procrustes: rmse 0.0005813276  max resid 0.002390621 
## ... Similar to previous best
## Run 6 stress 0.1289182 
## ... Procrustes: rmse 0.0004177733  max resid 0.002066896 
## ... Similar to previous best
## Run 7 stress 0.128918 
## ... Procrustes: rmse 0.0007037498  max resid 0.003397916 
## ... Similar to previous best
## Run 8 stress 0.1289176 
## ... Procrustes: rmse 0.0002662535  max resid 0.001338227 
## ... Similar to previous best
## Run 9 stress 0.1289173 
## ... Procrustes: rmse 0.0001883257  max resid 0.0009499089 
## ... Similar to previous best
## Run 10 stress 0.1289174 
## ... Procrustes: rmse 0.0001697551  max resid 0.0008425182 
## ... Similar to previous best
## Run 11 stress 0.1289171 
## ... Procrustes: rmse 0.0003841533  max resid 0.001410887 
## ... Similar to previous best
## Run 12 stress 0.1289178 
## ... Procrustes: rmse 0.0005804467  max resid 0.002114602 
## ... Similar to previous best
## Run 13 stress 0.1289177 
## ... Procrustes: rmse 0.0006340496  max resid 0.003262067 
## ... Similar to previous best
## Run 14 stress 0.1289179 
## ... Procrustes: rmse 0.0003754745  max resid 0.001888904 
## ... Similar to previous best
## Run 15 stress 0.1289176 
## ... Procrustes: rmse 0.0002764077  max resid 0.001389293 
## ... Similar to previous best
## Run 16 stress 0.1289178 
## ... Procrustes: rmse 0.0003500607  max resid 0.001756927 
## ... Similar to previous best
## Run 17 stress 0.1289178 
## ... Procrustes: rmse 0.0003515072  max resid 0.001822232 
## ... Similar to previous best
## Run 18 stress 0.1289169 
## ... New best solution
## ... Procrustes: rmse 1.658911e-05  max resid 7.407717e-05 
## ... Similar to previous best
## Run 19 stress 0.1289172 
## ... Procrustes: rmse 0.0001763503  max resid 0.0008883154 
## ... Similar to previous best
## Run 20 stress 0.1289182 
## ... Procrustes: rmse 0.0004332949  max resid 0.002179256 
## ... Similar to previous best
## *** Best solution repeated 3 times

# get NMDS 1-3
sp.cyclop.scores <- as.data.frame(scores(nmds.cyclop.rar, display = "sites"))
nmds.cyclop.rar.df <- merge(sample_data(cyclop.phylo), sp.cyclop.scores, by = 'row.names', all = TRUE)

#### pick envr. data and run the envfit function
env.cyclop<- nmds.cyclop.rar.df %>%
  dplyr::select(latitude, longitude, elevation_m, temp_C, chla_ug.L, pH, DO_perc, cond, DOC, TDN, TDP)

set.seed(123)
vf.cyclop <- envfit(nmds.cyclop.rar, env.cyclop, perm = 999, na.rm = TRUE) 
# lots of significance, except elevation, pH, DOC, TDN,    highest significance lat/long, spc, cond

### grab vectors and put into df
spp.cyclop.scrs <- as.data.frame(scores(vf.cyclop, display = "vectors")) *ordiArrowMul(vf.cyclop)
spp.cyclop.scrs.test <- cbind(spp.cyclop.scrs, r=vf.cyclop$vectors[2], pvals=vf.cyclop$vectors[4], 
                       Envfact = rownames(spp.cyclop.scrs))
names(spp.cyclop.scrs.test)[names(spp.cyclop.scrs.test) == 'r'] <- 'r2' # rename bc vector has naming issues

# subset to keep only those Envfactors that are significant 
spp.cyclop.scrs.signif<-spp.cyclop.scrs.test[spp.cyclop.scrs.test$pvals< 0.05, ]

# inspect
plot(nmds.cyclop.rar)
plot(vf.cyclop)

# make the plot
NMDS.cyclop <- ggplot(nmds.cyclop.rar.df) +
  geom_point(mapping = aes(x = NMDS1, y = NMDS2, colour = lake), size=2) +
  #coord_fixed(ratio=1.2) + 
  stat_ellipse(level=0.95, linetype = 2, linewidth=0.6, aes(x = NMDS1, y = NMDS2, color=lake)) +
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  ggtitle("Cyclopoids")+
  geom_segment(data = spp.cyclop.scrs.signif,
               aes(x = 0, xend = NMDS1, y = 0, yend = NMDS2),
               arrow = arrow(length = unit(0.25, "cm")), colour = "goldenrod4") +
  geom_text(data = spp.cyclop.scrs.signif, aes(x = NMDS1, y = NMDS2, label = Envfact), color="goldenrod4",
            size = 2.5)+
  theme_classic()+
  annotate(geom="text", x=2, y=-1.75, label="Stress=0.129", size=3,
           color="black")


### permanova and betadisp
# run PERMANOVA
set.seed(781)
broad.model.cyclop <- adonis2(bc_cyclop ~ time_point * lake, 
                            data = nmds.cyclop.rar.df, permutations = 999, by="terms")
broad.model.cyclop # signif lake and time effect, driven by time 2
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = bc_cyclop ~ time_point * lake, data = nmds.cyclop.rar.df, permutations = 999, by = "terms")
##                 Df SumOfSqs      R2       F Pr(>F)    
## time_point       4   3.9758 0.17788 11.7166  0.001 ***
## lake             4   9.2302 0.41296 27.2016  0.001 ***
## time_point:lake 12   4.7341 0.21180  4.6505  0.001 ***
## Residual        52   4.4113 0.19736                   
## Total           72  22.3514 1.00000                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

pairwise.adonis(bc_cyclop, factors=nmds.cyclop.rar.df$lake, p.adjust.m="holm")
##                        pairs Df SumsOfSqs   F.Model        R2 p.value
## 1             Blue vs Cooney  1  1.270655  7.225351 0.1940976   0.001
## 2           Blue vs Virginia  1  1.552882  7.998121 0.1817832   0.001
## 3      Blue vs Eastern Brook  1  3.824528 21.553294 0.4263480   0.001
## 4            Blue vs Convict  1  2.146810 12.877258 0.3491924   0.001
## 5         Cooney vs Virginia  1  1.934353  9.704544 0.2326975   0.001
## 6    Cooney vs Eastern Brook  1  3.414846 18.826275 0.4295659   0.001
## 7          Cooney vs Convict  1  2.249562 13.272184 0.3988973   0.001
## 8  Virginia vs Eastern Brook  1  3.733851 18.524429 0.3740463   0.001
## 9        Virginia vs Convict  1  1.658378  8.448376 0.2452474   0.001
## 10  Eastern Brook vs Convict  1  2.953048 17.211118 0.4752993   0.001
##    p.adjusted sig
## 1        0.01   *
## 2        0.01   *
## 3        0.01   *
## 4        0.01   *
## 5        0.01   *
## 6        0.01   *
## 7        0.01   *
## 8        0.01   *
## 9        0.01   *
## 10       0.01   *

# Env model
env.model.cyclop <- adonis2(bc_cyclop ~ latitude + elevation_m + temp_C + pH + cond + 
                            DOC + chla_ug.L + DO_perc, data = nmds.cyclop.rar.df, permutations = 999)
env.model.cyclop # latitude is best (0.18), elevation (0.12), cond (0.06)
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = bc_cyclop ~ latitude + elevation_m + temp_C + pH + cond + DOC + chla_ug.L + DO_perc, data = nmds.cyclop.rar.df, permutations = 999)
##          Df SumOfSqs      R2      F Pr(>F)    
## Model     8   11.501 0.51454 8.4791  0.001 ***
## Residual 64   10.851 0.48546                  
## Total    72   22.351 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# betadispersion
anova(betadisper(bc_cyclop, nmds.cyclop.rar.df$time)) # NS
## Analysis of Variance Table
## 
## Response: Distances
##           Df Sum Sq  Mean Sq F value Pr(>F)
## Groups     4 0.0508 0.012701  0.5815  0.677
## Residuals 68 1.4852 0.021841
anova(betadisper(bc_cyclop, nmds.cyclop.rar.df$lake)) # NS
## Analysis of Variance Table
## 
## Response: Distances
##           Df  Sum Sq  Mean Sq F value Pr(>F)
## Groups     4 0.08953 0.022383  0.8638 0.4903
## Residuals 68 1.76203 0.025912

# alternative means of finding best model
cyclo.rar.env<- nmds.cyclop.rar.df %>% dplyr::select(latitude, elevation_m, temp_C, pH, cond, DOC, chla_ug.L, DO_perc)
bioenv(bc_cyclop, cyclo.rar.env, scale.var = TRUE)
## 
## Call:
## bioenv(comm = bc_cyclop, env = cyclo.rar.env, scale.var = TRUE) 
## 
## Subset of environmental variables with best correlation to community data.
## 
## Correlations:    spearman 
## Dissimilarities: bray 
## Metric:          euclidean 
## 
## Best model has 3 parameters (max. 8 allowed):
## latitude pH cond
## with correlation  0.7084598
# latitude, pH, cond = 0.71
  • Lastly, run NMDS-envfit for Water
# subset
water.phylo<-subset_samples(PS.500, sample_type=="Water")
water.phylo <- prune_taxa(taxa_sums(water.phylo) >0, water.phylo) # remove those asvs with 0 sum

####################################
# first let's look at PERMANOVA for the full dataset
# rarefied
set.seed(1000)
bc_water <- phyloseq::distance(water.phylo, method = "bray") # rarefied

# run NMDS on Bray distance object (defined above for 'bc_water'), this is
set.seed(520)
nmds.water.rar <- metaMDS(bc_water,
          distance = "bray",
          k = 3,
          maxit = 999, 
          wascores = TRUE)
## Run 0 stress 0.08162354 
## Run 1 stress 0.08680888 
## Run 2 stress 0.08162367 
## ... Procrustes: rmse 0.0001121696  max resid 0.0002323592 
## ... Similar to previous best
## Run 3 stress 0.08162361 
## ... Procrustes: rmse 7.358572e-05  max resid 0.0002362312 
## ... Similar to previous best
## Run 4 stress 0.08162354 
## ... New best solution
## ... Procrustes: rmse 1.255076e-05  max resid 2.813865e-05 
## ... Similar to previous best
## Run 5 stress 0.08162356 
## ... Procrustes: rmse 0.0001425018  max resid 0.0003072561 
## ... Similar to previous best
## Run 6 stress 0.08162357 
## ... Procrustes: rmse 4.613887e-05  max resid 0.000131395 
## ... Similar to previous best
## Run 7 stress 0.08162366 
## ... Procrustes: rmse 0.0002184892  max resid 0.0004722827 
## ... Similar to previous best
## Run 8 stress 0.08162365 
## ... Procrustes: rmse 0.0002199381  max resid 0.0004651429 
## ... Similar to previous best
## Run 9 stress 0.08162356 
## ... Procrustes: rmse 0.0001355224  max resid 0.0002910289 
## ... Similar to previous best
## Run 10 stress 0.08162355 
## ... Procrustes: rmse 2.400453e-05  max resid 5.270339e-05 
## ... Similar to previous best
## Run 11 stress 0.08680874 
## Run 12 stress 0.08162361 
## ... Procrustes: rmse 8.499383e-05  max resid 0.0003282724 
## ... Similar to previous best
## Run 13 stress 0.08162354 
## ... New best solution
## ... Procrustes: rmse 8.765571e-05  max resid 0.0001854587 
## ... Similar to previous best
## Run 14 stress 0.08162367 
## ... Procrustes: rmse 0.0001417413  max resid 0.0003187578 
## ... Similar to previous best
## Run 15 stress 0.08162353 
## ... New best solution
## ... Procrustes: rmse 5.14543e-05  max resid 9.85746e-05 
## ... Similar to previous best
## Run 16 stress 0.08162359 
## ... Procrustes: rmse 0.000108919  max resid 0.0002576188 
## ... Similar to previous best
## Run 17 stress 0.08743131 
## Run 18 stress 0.08162356 
## ... Procrustes: rmse 6.80506e-05  max resid 0.0002054003 
## ... Similar to previous best
## Run 19 stress 0.08162356 
## ... Procrustes: rmse 7.706698e-05  max resid 0.0001638814 
## ... Similar to previous best
## Run 20 stress 0.0816236 
## ... Procrustes: rmse 0.0001228395  max resid 0.0002680078 
## ... Similar to previous best
## *** Best solution repeated 5 times


# get NMDS 1-3
sp.water.scores <- as.data.frame(scores(nmds.water.rar, display = "sites"))
nmds.water.rar.df <- merge(sample_data(water.phylo), sp.water.scores, by = 'row.names', all = TRUE)

#### pick envr. data and run the envfit function
env.water<- nmds.water.rar.df %>%
  dplyr::select(latitude, longitude, elevation_m, temp_C, chla_ug.L, pH, DO_perc, cond, DOC, TDN, TDP)

set.seed(123)
vf.water <- envfit(nmds.water.rar, env.water, perm = 999, na.rm = TRUE) 
# lots of significance, except elevation, pH, DOC, TDN,    highest significance lat/long, spc, cond

### grab vectors and put into df
spp.water.scrs <- as.data.frame(scores(vf.water, display = "vectors")) *ordiArrowMul(vf.water)
spp.water.scrs.test <- cbind(spp.water.scrs, r=vf.water$vectors[2], pvals=vf.water$vectors[4], 
                       Envfact = rownames(spp.water.scrs))
names(spp.water.scrs.test)[names(spp.water.scrs.test) == 'r'] <- 'r2' # rename bc vector has naming issues

# subset to keep only those Envfactors that are significant 
spp.water.scrs.signif<-spp.water.scrs.test[spp.water.scrs.test$pvals< 0.05, ]

# inspect
plot(nmds.water.rar)
plot(vf.water)

# make the plot
NMDS.water <- ggplot(nmds.water.rar.df) +
  geom_point(mapping = aes(x = NMDS1, y = NMDS2, colour = lake), size=2) +
  #coord_fixed(ratio=0.7) + 
  xlim(-1.3, 1.5)+
  ylim(-2.7,2.3)+
  stat_ellipse(level=0.95, linetype = 2, linewidth=0.6, aes(x = NMDS1, y = NMDS2, color=lake)) +
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  ggtitle("Water")+
  geom_segment(data = spp.water.scrs.signif,
               aes(x = 0, xend = NMDS1, y = 0, yend = NMDS2),
               arrow = arrow(length = unit(0.25, "cm")), colour = "goldenrod4") +
  geom_text(data = spp.water.scrs.signif, aes(x = NMDS1, y = NMDS2, label = Envfact), color="goldenrod4",
            size = 2.5)+
  theme_classic() + 
  annotate(geom="text", x=1, y=-2.7, label="Stress=0.082", size=3,
           color="black")

### permanova and betadisp
# run PERMANOVA
set.seed(781)
broad.model.water <- adonis2(bc_water ~ time_point + lake, 
                            data = nmds.water.rar.df, permutations = 999)
broad.model.water # signif lake effect, time effects minimal and largely random
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = bc_water ~ time_point + lake, data = nmds.water.rar.df, permutations = 999)
##          Df SumOfSqs      R2      F Pr(>F)    
## Model     9   4.9146 0.65153 3.7393  0.001 ***
## Residual 18   2.6286 0.34847                  
## Total    27   7.5432 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

pairwise.adonis(bc_water, factors=nmds.water.rar.df$lake, p.adjust.m="holm")
##                        pairs Df SumsOfSqs  F.Model        R2 p.value p.adjusted
## 1    Serene vs Eastern Brook  1 0.4816068 2.886099 0.2651178   0.006      0.078
## 2           Serene vs Cooney  1 1.0526944 6.887096 0.4959349   0.005      0.075
## 3          Serene vs Convict  1 1.2823636 8.167115 0.5051684   0.010      0.096
## 4         Serene vs Virginia  1 0.9487196 5.071612 0.3879867   0.012      0.096
## 5             Serene vs Blue  1 1.0207423 5.661405 0.4471388   0.013      0.096
## 6    Eastern Brook vs Cooney  1 0.8355006 5.843223 0.4549655   0.010      0.096
## 7   Eastern Brook vs Convict  1 1.1509832 7.756762 0.4922815   0.013      0.096
## 8  Eastern Brook vs Virginia  1 0.8529988 4.780480 0.3740454   0.016      0.096
## 9      Eastern Brook vs Blue  1 0.9062059 5.317036 0.4316815   0.008      0.096
## 10         Cooney vs Convict  1 0.7068356 5.366092 0.4339360   0.008      0.096
## 11        Cooney vs Virginia  1 0.4298500 2.588451 0.2699551   0.008      0.096
## 12            Cooney vs Blue  1 0.4743275 3.066697 0.3382375   0.037      0.096
## 13       Convict vs Virginia  1 0.5866652 3.480082 0.3031408   0.008      0.096
## 14           Convict vs Blue  1 0.6775808 4.256943 0.3781616   0.005      0.075
## 15          Virginia vs Blue  1 0.2880765 1.488670 0.1753714   0.161      0.161
##    sig
## 1     
## 2     
## 3     
## 4     
## 5     
## 6     
## 7     
## 8     
## 9     
## 10    
## 11    
## 12    
## 13    
## 14    
## 15

# Env model
env.model.water <- adonis2(bc_water ~ latitude + elevation_m + temp_C + pH + cond + 
                            DOC + chla_ug.L + DO_perc, data = nmds.water.rar.df, permutations = 999, by="terms")
env.model.water # latitude is best (0.19), elevation (0.18), cond (0.07), temp (0.06)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = bc_water ~ latitude + elevation_m + temp_C + pH + cond + DOC + chla_ug.L + DO_perc, data = nmds.water.rar.df, permutations = 999, by = "terms")
##             Df SumOfSqs      R2      F Pr(>F)    
## latitude     1   1.4021 0.18588 8.1939  0.001 ***
## elevation_m  1   1.2492 0.16560 7.3000  0.001 ***
## temp_C       1   0.4264 0.05653 2.4920  0.008 ** 
## pH           1   0.0929 0.01232 0.5432  0.955    
## cond         1   0.5758 0.07633 3.3649  0.001 ***
## DOC          1   0.1679 0.02225 0.9810  0.428    
## chla_ug.L    1   0.2167 0.02873 1.2663  0.189    
## DO_perc      1   0.1610 0.02135 0.9411  0.472    
## Residual    19   3.2512 0.43101                  
## Total       27   7.5432 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# betadispersion
anova(betadisper(bc_water, nmds.water.rar.df$time)) # NS
## Analysis of Variance Table
## 
## Response: Distances
##           Df   Sum Sq   Mean Sq F value Pr(>F)
## Groups     4 0.002961 0.0007404  0.1176 0.9749
## Residuals 23 0.144762 0.0062940
anova(betadisper(bc_water, nmds.water.rar.df$lake)) # NS
## Analysis of Variance Table
## 
## Response: Distances
##           Df   Sum Sq   Mean Sq F value Pr(>F)
## Groups     5 0.027613 0.0055226  0.7317 0.6074
## Residuals 22 0.166051 0.0075478

# alternative means of finding best model
wat.rar.env<- nmds.water.rar.df %>% dplyr::select(latitude, elevation_m, temp_C, pH, cond, DOC, chla_ug.L, DO_perc)
bioenv(bc_water, wat.rar.env, scale.var = TRUE)
## 
## Call:
## bioenv(comm = bc_water, env = wat.rar.env, scale.var = TRUE) 
## 
## Subset of environmental variables with best correlation to community data.
## 
## Correlations:    spearman 
## Dissimilarities: bray 
## Metric:          euclidean 
## 
## Best model has 2 parameters (max. 8 allowed):
## latitude cond
## with correlation  0.8043698
# latitude + cond = 0.81

Make an aggregate plot of the 4 NMDS-envs.

extract.legend <- get_legend(
  # create some space to the left of the legend
  NMDS.daph + theme(legend.box.margin = margin(0, 0, 0, 10)))

NMDS.env.species<-plot_grid(NMDS.daph + guides(color="none"), 
                            NMDS.calan + guides(color="none"),
                            NMDS.cyclop + guides(color="none"),
                            NMDS.water + guides(color="none"),
                            extract.legend,
          ncol=5, rel_widths = c(10,10,10,10,3), labels = c("A","B", "C", "D", ""))

NMDS.env.species
**Figure 8**. Non-metric multidimensional scaling (NMDS) ordination on Bray-Curtis dissimilarity matrix of bacterial communities associated with (A) Daphnia, (B) calanoid and (C) cyclopoid copepod zooplankton, or in (D) water across six lakes and five sampling time points.

Figure 8. Non-metric multidimensional scaling (NMDS) ordination on Bray-Curtis dissimilarity matrix of bacterial communities associated with (A) Daphnia, (B) calanoid and (C) cyclopoid copepod zooplankton, or in (D) water across six lakes and five sampling time points.

dev.copy(pdf, "figures/Fig8.NMDS.env.specieswater.pdf", height=4, width=13)
dev.off()

NMDS by taxa-space-time

This code runs NMDS plots for all the data with different binning factors. May need to reduce and/or decide if PCoA or NMDS is best for showing “full community differences”, which presently is in both places.

nmds.organism <- ggplot(nmds.rar.df, aes(x=NMDS1, y=NMDS2, color = organism)) +
  geom_point(aes(color = organism, shape=sample_type), size = 1.3) + 
  stat_ellipse(level=0.95, lwd = 0.5, linetype = 2, aes(color=organism)) +
  xlim(-2.4,2.25)+
  ylim(-2.2,2.4)+
  scale_color_manual(values=c("coral2", "gold3", "darkolivegreen3", "darkslategray4", "dodgerblue")) +
  scale_shape_manual(values=c(1,16))+
  ggtitle("Organism") +
  theme_classic()


nmds.phy_group <- ggplot(nmds.rar.df, aes(x=NMDS1, y=NMDS2, color = phy_group)) +
  geom_point(aes(color = phy_group, shape=sample_type), size = 1.3) + 
  stat_ellipse(level=0.95, lwd = 0.5, linetype = 2, aes(color=phy_group)) +
  theme_classic()+
  xlim(-2.4,2.25)+
  ylim(-2.2,2.4)+
  scale_color_manual(values = c("indianred3","darkolivegreen4", "dodgerblue3")) +
  scale_shape_manual(values=c(1,16))+
  ggtitle("Taxonomic Group")


nmds.rar.df$Date<-as.factor(nmds.rar.df$time_point)
nmds.rar.df$Date<-recode_factor(nmds.rar.df$Date, 
                                "1" = "25-Jun",
                                "2" = "6-Jul",
                                "3" = "21-Jul",
                                "4" = "4-Aug",
                                "5" = "22-Aug")

nmds.time <- ggplot(nmds.rar.df, aes(x=NMDS1, y=NMDS2)) +
  geom_point(aes(color = Date, shape=sample_type), size = 1.3) + 
  stat_ellipse(level=0.95, lwd = 0.5, linetype = 2, aes(color=Date)) +
  scale_shape_manual(values=c(1,16))+
  xlim(-2.4,2.25)+
  ylim(-2.2,2.4)+
  scale_color_manual(values=c("indianred4", "indianred1", "orange", "palevioletred", "thistle")) + 
  ggtitle("Sampling Time Point") +
  theme_classic()
 
# plot by lake and basin and rename the factors
nmds.rar.df$int.basin.type<-interaction(nmds.rar.df$basin,nmds.rar.df$sample_type)
# recode factors
nmds.rar.df$int.basin.type <- recode_factor(nmds.rar.df$int.basin.type,
                                     North.Water = "N.Water",
                                     South.Water = "S.Water",
                                     North.Zooplankton = "N.Zoop",
                                     South.Zooplankton = "S.Zoop")


nmds.lake.basin <- ggplot(nmds.rar.df, aes(x=NMDS1, y=NMDS2)) +
  geom_point(aes(color = lake, shape=int.basin.type), size = 1.3) + 
  stat_ellipse(level=0.95, lwd = 0.5, linetype = 2, aes(color=lake)) +
  scale_shape_manual(values=c(0,2,15,17)) +
  xlim(-2.4,2.25)+
  ylim(-2.2,2.4)+
  scale_color_manual(values =c("darkslategray", "dodgerblue3", "#63A5CA", "lightsteelblue4","lightblue3", "#8BDDC4"))+
  ggtitle("Lake-Basin") +
  theme_classic()


phy_group_plots_combined <- plot_grid(
  nmds.organism, nmds.phy_group, nmds.time, nmds.lake.basin,  rel_widths = c(3,3,3,3,3),
  nrow=2, ncol=2, labels = c("A","B","C","D"))

phy_group_plots_combined
**Figure 7**. Bacterial communities associated with zooplankton and in water of six mountain lakes across five sampling time points. Non-metric multidimensional scaling (NMDS) ordination plots grouped by (A) host organism (species or functional group), (B) taxonomic group, (C) sampling time points, and (D) across lake-basins, being in the north or south.

Figure 7. Bacterial communities associated with zooplankton and in water of six mountain lakes across five sampling time points. Non-metric multidimensional scaling (NMDS) ordination plots grouped by (A) host organism (species or functional group), (B) taxonomic group, (C) sampling time points, and (D) across lake-basins, being in the north or south.

dev.print(pdf, "figures/Fig7.NMDS.groupeffects.pdf", height=8, width=10)
dev.off()

Stacked bar

Relative abundance stacked bar plot for microbial community

########### plotting relative abundance, help from Megan Demmel, this is what I will use for analysis of relative abundances/ composition
# PS.500 %>%  plot_composition()

# unload("MicrobiotaProcess") if there is an issue with the taxa table, this is the source (conflict)
pm <- psmelt(PS.500)
Abund <- aggregate(Abundance ~ sample_type + organism + phy_group + lake + time_point + sampleNames + Class, data=pm, FUN=sum)
sam <- aggregate(Abundance ~ sample_type + organism + phy_group + lake + time_point +sampleNames, data=pm, FUN=sum) # this should include everything Abund has except for Class

colnames(sam)[ncol(sam)] <- "tot.abund" # renaming the last column 'tot.abund' so that we can differentiate that from rel.abund

Abund.final <- dplyr::full_join(Abund, sam)
Abund.final$rel_abund <- Abund.final$Abundance/Abund.final$tot.abund

# classes that have rel.abund less than 5% will be named as other
Abund.final$Class[Abund.final$rel_abund <= 0.05] <- "Other (<5%)" 

# to avoid Quartz opening in Mac OS
Show <- function(x){
    invisible(
        if (interactive()){
            View(x, deparse(substitute(x)))
        }
    )
}

# now "view or show" the table
Show(Abund.final)

Abund.final$Class<-as.factor(Abund.final$Class)
Abund.final$Class<-factor(Abund.final$Class, 
                          levels=c("Acidimicrobiia", "Actinobacteria", "Alphaproteobacteria",
                              "Bacilli", "Bacteroidia", "Cyanobacteriia", "Gammaproteobacteria", "Saccharimonadia",
                               "Verrucomicrobiae", "Other (<5%)"))

Abund.final$Date<-as.factor(Abund.final$time_point)
Abund.final$Date<-recode_factor(Abund.final$Date, 
                                "1" = "25-Jun",
                                "2" = "6-Jul",
                                "3" = "21-Jul",
                                "4" = "4-Aug",
                                "5" = "22-Aug")

sum.rel<-aggregate(rel_abund~phy_group:Class:lake:Date:sampleNames, data=Abund.final, FUN=sum)

# rel abund plot across time, lake, and sample type
Rel.abund.class <- ggplot(Abund.final, aes(x = Date, y = rel_abund, fill= Class, color = Class)) +
  geom_bar(width = 0.8, aes(fill = Class, color = Class), stat ="identity", position = "fill") +
  facet_grid(cols = vars(lake), rows = vars(phy_group)) + 
  xlab("Sampling Time Point") +  
  ylab("Relative Abundance") +
  theme_bw() +
  theme(axis.text.x=element_text(size=8, angle = 75, vjust = 0.5, hjust=1), axis.title.x=element_text(size=13)) +
  theme(axis.text.y=element_text(size=11), axis.title.y=element_text(size=13)) +
  theme(strip.text.x = element_text(size = 13), strip.text.y = element_text(size = 13)) +
  scale_fill_manual(values = c("red4", "orangered", "gold", "lightsalmon", "seagreen4", "lightgreen","lightskyblue", "royalblue3", "plum", "tan3")) +
  scale_color_manual(values = c("red4", "orangered", "gold", "lightsalmon", "seagreen4", "lightgreen","lightskyblue", "royalblue3", "plum", "tan3"))

Rel.abund.class
dev.copy(pdf, "figures/Fig6.Relabund.facet.pdf", height=10, width=12)
dev.off()
**Figure 6**. Relative abundance of bacteria classes among Cladocera (Daphnia, Holopedium) and Copepoda (calanoid and cyclopoid copepods) zooplankton and water in six lakes of the Eastern Sierra Nevada mountains at five time points across the summer season.

Figure 6. Relative abundance of bacteria classes among Cladocera (Daphnia, Holopedium) and Copepoda (calanoid and cyclopoid copepods) zooplankton and water in six lakes of the Eastern Sierra Nevada mountains at five time points across the summer season.

# if merged across all samples at organism level...
merged.PS.500.organism <- merge_samples(PS.500, "organism") # merge by organisms, pooling across all levels
sample_data(merged.PS.500.organism)$organism<-factor(c("Daphnia", "Holopedium", "Calanoid", "Cyclopoid", "Water"))
sample_data(merged.PS.500.organism)$organism<-factor(sample_data(merged.PS.500.organism)$organism,
                                  levels=c("Daphnia", "Holopedium", "Calanoid", "Cyclopoid", "Water"))


normalized_ps_family.org = tax_glom(merged.PS.500.organism, "Family", NArm = TRUE)
norma_ps_family_relabun_org = transform_sample_counts(normalized_ps_family.org, function(OTU) OTU/sum(OTU))

ps.abund.fam.org<-psmelt(norma_ps_family_relabun_org)
ps.abund.fam.org<-ps.abund.fam.org %>%
   dplyr::rename("rel_abund" = "Abundance") 

# families that have rel.abund less than 10% will be named as other
ps.abund.fam.org$Family[ps.abund.fam.org$rel_abund <= 0.05] <- "Other (<5%)" 

ps.abund.fam.org$Fam.clas.other<-paste(ps.abund.fam.org$Class, ps.abund.fam.org$Family, sep="-")
ps.abund.fam.org$Fam.clas.other<-ifelse(
                  ps.abund.fam.org$Family == "Other (<5%)", "Other (<5%)", ps.abund.fam.org$Fam.clas.other)

ps.abund.fam.org$Fam.clas.other<-fct_relevel(ps.abund.fam.org$Fam.clas.other, 'Other (<5%)', after=Inf)

  
#### Family-Class level dominance in each group (zooplankton or water samples)

fam.org.dom<-ps.abund.fam.org[!(ps.abund.fam.org$Fam.clas.other=="Other (<5%)"),]
fam.org.dom.sum<- fam.org.dom %>%
  dplyr::select(organism, rel_abund, Fam.clas.other)


ggplot(ps.abund.fam.org, aes(x = organism, y = rel_abund, fill= Fam.clas.other, color = Fam.clas.other)) +
  geom_bar(width = 0.8, aes(fill = Fam.clas.other, color = Fam.clas.other),
           stat ="identity", position = "fill") +
  xlab("Zooplankton and Water Samples") +  
  ylab("Relative Abundance") +
  theme_bw() +
  theme(axis.text.x=element_text(size=8, vjust = 0.5, hjust=1), axis.title.x=element_text(size=13)) +
  theme(axis.text.y=element_text(size=11), axis.title.y=element_text(size=13)) +
  theme(strip.text.x = element_text(size = 13), strip.text.y = element_text(size = 13)) +
   scale_fill_manual(values = c("orangered",
                               "goldenrod1",
                               "springgreen4",  "limegreen","yellowgreen",
      "deepskyblue4", "steelblue", "cornflowerblue", "lightskyblue","turquoise", "tan3")) +
  scale_color_manual(values = c("orangered",
                               "goldenrod2",
                               "springgreen4", "limegreen", "yellowgreen",
      "deepskyblue4",  "steelblue", "cornflowerblue", "lightskyblue", "turquoise","tan3"))

dev.copy(pdf, "figures/FigS4.Relabund.class.fam.pdf", height=8, width=10)
dev.off()
**Figure S4**. Relative abundance of bacterial taxa Families among Daphnia, Holopedium, calanoid and cyclopoid copepods, and bacterioplankton in water from lakes of the Eastern Sierra Nevada mountains.

Figure S4. Relative abundance of bacterial taxa Families among Daphnia, Holopedium, calanoid and cyclopoid copepods, and bacterioplankton in water from lakes of the Eastern Sierra Nevada mountains.

Bacteria abund-by-Environ.

Use the relative abundance data and plot the main taxonomic groups of bacteria against environmental predictors.
- We will use: Gammaproteobacteria, Alphaproteobacteria, Actinobacteria, Bacteroidia.
- plots are combined with similar analyses of Shannon diversity and environmental predictors.
- first run multiple linear regressions to identify drivers.
- these outputs are in Table S5.

# using Abund.final from the last code chunk
# testing the effects of time, lake, and sample type on the 5 most dominant classes of bacteria

############ GAMMAPROTEOBACTERIA
gammaproteobacteria <- subset(Abund.final, Class == "Gammaproteobacteria")
gammaproteobacteria <- summaryBy(rel_abund ~ sampleNames, data = gammaproteobacteria, FUN = sum)
gammaproteobacteria <- data.frame(gammaproteobacteria, row.names = gammaproteobacteria[,1])
names(gammaproteobacteria)[names(gammaproteobacteria) == 'rel_abund.sum'] <- 'Gamma.rel_abund.sum'
gammaproteobacteria <- merge(gammaproteobacteria, sample_data(PS.500), by = "row.names")
names(gammaproteobacteria)[names(gammaproteobacteria) == 'sampleNames.x'] <- 'sampleNames'

#run a model
Gam.model<-lm(Gamma.rel_abund.sum ~ time_point + lake + phy_group, data = gammaproteobacteria)
Anova(Gam.model, type=2)
## Anova Table (Type II tests)
## 
## Response: Gamma.rel_abund.sum
##            Sum Sq  Df F value    Pr(>F)    
## time_point 1.1471   4  9.2244 5.441e-07 ***
## lake       2.6478   5 17.0342 1.349e-14 ***
## phy_group  4.5053   2 72.4596 < 2.2e-16 ***
## Residuals  8.2074 264                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(allEffects(Gam.model))
# time, lake, phy.group all matter

# separate sample types
gamma.wat<-gammaproteobacteria[(gammaproteobacteria$sample_type=="Water"),]
gamma.zoop<-gammaproteobacteria[(gammaproteobacteria$sample_type=="Zooplankton"),]

#### water gamma
# testing environmental effects on gamaproteobacteria abundances
FitAll.gamma.wat <- lm(Gamma.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + chla_ug.L + 
                         pH + DO_perc + cond, data = gamma.wat)
stepAIC(FitAll.gamma.wat, direction = "both")
## Start:  AIC=-171.47
## Gamma.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + 
##     chla_ug.L + pH + DO_perc + cond
## 
##               Df Sum of Sq      RSS     AIC
## - DO_perc      1 0.0000382 0.032276 -173.44
## - log(DOC)     1 0.0001506 0.032388 -173.34
## - elevation_m  1 0.0003155 0.032553 -173.20
## <none>                     0.032237 -171.47
## - latitude     1 0.0027993 0.035037 -171.14
## - cond         1 0.0048822 0.037120 -169.52
## - pH           1 0.0058742 0.038112 -168.78
## - chla_ug.L    1 0.0075829 0.039820 -167.56
## - temp_C       1 0.0148371 0.047075 -162.87
## 
## Step:  AIC=-173.44
## Gamma.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + 
##     chla_ug.L + pH + cond
## 
##               Df Sum of Sq      RSS     AIC
## - log(DOC)     1 0.0001430 0.032419 -175.31
## - elevation_m  1 0.0003985 0.032674 -175.09
## <none>                     0.032276 -173.44
## - latitude     1 0.0029141 0.035190 -173.02
## + DO_perc      1 0.0000382 0.032237 -171.47
## - cond         1 0.0049977 0.037273 -171.41
## - pH           1 0.0059726 0.038248 -170.68
## - chla_ug.L    1 0.0081986 0.040474 -169.10
## - temp_C       1 0.0152550 0.047531 -164.60
## 
## Step:  AIC=-175.31
## Gamma.rel_abund.sum ~ latitude + elevation_m + temp_C + chla_ug.L + 
##     pH + cond
## 
##               Df Sum of Sq      RSS     AIC
## - elevation_m  1 0.0002749 0.032694 -177.08
## <none>                     0.032419 -175.31
## - latitude     1 0.0038283 0.036247 -174.19
## + log(DOC)     1 0.0001430 0.032276 -173.44
## + DO_perc      1 0.0000305 0.032388 -173.34
## - cond         1 0.0053893 0.037808 -173.01
## - pH           1 0.0063092 0.038728 -172.34
## - chla_ug.L    1 0.0082267 0.040645 -170.98
## - temp_C       1 0.0151261 0.047545 -166.59
## 
## Step:  AIC=-177.08
## Gamma.rel_abund.sum ~ latitude + temp_C + chla_ug.L + pH + cond
## 
##               Df Sum of Sq      RSS     AIC
## <none>                     0.032694 -177.08
## + elevation_m  1 0.0002749 0.032419 -175.31
## + DO_perc      1 0.0001028 0.032591 -175.17
## + log(DOC)     1 0.0000195 0.032674 -175.09
## - pH           1 0.0060398 0.038733 -174.33
## - chla_ug.L    1 0.0096055 0.042299 -171.87
## - temp_C       1 0.0148556 0.047549 -168.59
## - cond         1 0.0242578 0.056951 -163.54
## - latitude     1 0.0255802 0.058274 -162.90
## 
## Call:
## lm(formula = Gamma.rel_abund.sum ~ latitude + temp_C + chla_ug.L + 
##     pH + cond, data = gamma.wat)
## 
## Coefficients:
## (Intercept)     latitude       temp_C    chla_ug.L           pH         cond  
##    5.670540    -0.140804    -0.015461    -0.065274     0.029497    -0.001039

# best model
Best.gamma.wat<-lm(Gamma.rel_abund.sum ~ latitude + temp_C + chla_ug.L + pH + cond, data = gamma.wat)
summary(Best.gamma.wat)
## 
## Call:
## lm(formula = Gamma.rel_abund.sum ~ latitude + temp_C + chla_ug.L + 
##     pH + cond, data = gamma.wat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.077678 -0.016542 -0.004509  0.020883  0.088956 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.6705401  1.2724381   4.456 0.000198 ***
## latitude    -0.1408037  0.0339377  -4.149 0.000419 ***
## temp_C      -0.0154614  0.0048902  -3.162 0.004521 ** 
## chla_ug.L   -0.0652743  0.0256745  -2.542 0.018561 *  
## pH           0.0294966  0.0146312   2.016 0.056172 .  
## cond        -0.0010386  0.0002571  -4.040 0.000547 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03855 on 22 degrees of freedom
## Multiple R-squared:  0.7665, Adjusted R-squared:  0.7135 
## F-statistic: 14.45 on 5 and 22 DF,  p-value: 2.525e-06
Anova(Best.gamma.wat, type=2) # conductivity is most important, followed by elevation_m, temp
## Anova Table (Type II tests)
## 
## Response: Gamma.rel_abund.sum
##             Sum Sq Df F value    Pr(>F)    
## latitude  0.025580  1 17.2133 0.0004194 ***
## temp_C    0.014856  1  9.9966 0.0045213 ** 
## chla_ug.L 0.009605  1  6.4637 0.0185614 *  
## pH        0.006040  1  4.0643 0.0561723 .  
## cond      0.024258  1 16.3235 0.0005468 ***
## Residuals 0.032694 22                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(lm(Gamma.rel_abund.sum ~ cond , data = gamma.wat)) #p<0.001, R2=0.436 <<< best
## 
## Call:
## lm(formula = Gamma.rel_abund.sum ~ cond, data = gamma.wat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.085324 -0.034400 -0.003066  0.029867  0.144564 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.2837306  0.0189160  15.000 2.58e-14 ***
## cond        -0.0010513  0.0002246  -4.681 7.81e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05406 on 26 degrees of freedom
## Multiple R-squared:  0.4573, Adjusted R-squared:  0.4364 
## F-statistic: 21.91 on 1 and 26 DF,  p-value: 7.806e-05
summary(lm(Gamma.rel_abund.sum ~ latitude , data = gamma.wat)) #p=0.007, R2=0.22
## 
## Call:
## lm(formula = Gamma.rel_abund.sum ~ latitude, data = gamma.wat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.132750 -0.053250  0.000426  0.037805  0.163574 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  4.89309    1.58861   3.080  0.00484 **
## latitude    -0.12408    0.04208  -2.949  0.00667 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06353 on 26 degrees of freedom
## Multiple R-squared:  0.2506, Adjusted R-squared:  0.2218 
## F-statistic: 8.694 on 1 and 26 DF,  p-value: 0.006667
summary(lm(Gamma.rel_abund.sum ~ temp_C , data = gamma.wat)) #p=0.056, R2=0.100
## 
## Call:
## lm(formula = Gamma.rel_abund.sum ~ temp_C, data = gamma.wat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.09987 -0.04223 -0.01901  0.03947  0.16168 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.404311   0.098299   4.113 0.000348 ***
## temp_C      -0.011755   0.005872  -2.002 0.055812 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06831 on 26 degrees of freedom
## Multiple R-squared:  0.1336, Adjusted R-squared:  0.1002 
## F-statistic: 4.008 on 1 and 26 DF,  p-value: 0.05581
anova(lm(Gamma.rel_abund.sum ~ chla_ug.L , data = gamma.wat)) #NS
## Analysis of Variance Table
## 
## Response: Gamma.rel_abund.sum
##           Df   Sum Sq   Mean Sq F value Pr(>F)
## chla_ug.L  1 0.012109 0.0121092  2.4613 0.1288
## Residuals 26 0.127917 0.0049199
anova(lm(Gamma.rel_abund.sum ~ pH , data = gamma.wat)) #NS
## Analysis of Variance Table
## 
## Response: Gamma.rel_abund.sum
##           Df   Sum Sq  Mean Sq F value Pr(>F)
## pH         1 0.000303 0.000303  0.0564 0.8142
## Residuals 26 0.139724 0.005374

#### zoop gamma
# testing environmental effects on gamaproteobacteria abundances
FitAll.gamma.zoop <- lm(Gamma.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + chla_ug.L + 
                         pH + DO_perc + cond, data = gamma.zoop)
stepAIC(FitAll.gamma.zoop, direction = "both")
## Start:  AIC=-791.21
## Gamma.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + 
##     chla_ug.L + pH + DO_perc + cond
## 
##               Df Sum of Sq     RSS     AIC
## - log(DOC)     1   0.02076  9.5130 -792.67
## <none>                      9.4922 -791.21
## - temp_C       1   0.09609  9.5883 -790.71
## - pH           1   0.10842  9.6007 -790.40
## - chla_ug.L    1   0.24849  9.7407 -786.80
## - cond         1   0.57102 10.0633 -778.73
## - DO_perc      1   1.03248 10.5247 -767.61
## - elevation_m  1   1.43743 10.9297 -758.24
## - latitude     1   2.43517 11.9274 -736.58
## 
## Step:  AIC=-792.67
## Gamma.rel_abund.sum ~ latitude + elevation_m + temp_C + chla_ug.L + 
##     pH + DO_perc + cond
## 
##               Df Sum of Sq     RSS     AIC
## <none>                      9.5130 -792.67
## - pH           1   0.09404  9.6071 -792.23
## - temp_C       1   0.09621  9.6092 -792.18
## + log(DOC)     1   0.02076  9.4922 -791.21
## - chla_ug.L    1   0.22776  9.7408 -788.80
## - cond         1   0.96967 10.4827 -770.60
## - DO_perc      1   1.03237 10.5454 -769.12
## - elevation_m  1   1.89852 11.4115 -749.54
## - latitude     1   2.94874 12.4618 -727.71
## 
## Call:
## lm(formula = Gamma.rel_abund.sum ~ latitude + elevation_m + temp_C + 
##     chla_ug.L + pH + DO_perc + cond, data = gamma.zoop)
## 
## Coefficients:
## (Intercept)     latitude  elevation_m       temp_C    chla_ug.L           pH  
##  33.6470081   -0.9981432    0.0009514    0.0136754   -0.1049181   -0.0436603  
##     DO_perc         cond  
##   0.0227697    0.0044949

# best model
Best.gamma.zoop<-lm(Gamma.rel_abund.sum ~ latitude + elevation_m + temp_C + chla_ug.L + pH + DO_perc + cond, data = gamma.zoop)
summary(Best.gamma.zoop)
## 
## Call:
## lm(formula = Gamma.rel_abund.sum ~ latitude + elevation_m + temp_C + 
##     chla_ug.L + pH + DO_perc + cond, data = gamma.zoop)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.56433 -0.14301 -0.00228  0.14946  0.39843 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 33.6470081  3.8372058   8.769 3.39e-16 ***
## latitude    -0.9981432  0.1157252  -8.625 8.86e-16 ***
## elevation_m  0.0009514  0.0001375   6.921 4.07e-11 ***
## temp_C       0.0136754  0.0087779   1.558   0.1206    
## chla_ug.L   -0.1049181  0.0437686  -2.397   0.0173 *  
## pH          -0.0436603  0.0283451  -1.540   0.1248    
## DO_perc      0.0227697  0.0044616   5.103 6.78e-07 ***
## cond         0.0044949  0.0009088   4.946 1.42e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1991 on 240 degrees of freedom
## Multiple R-squared:  0.2589, Adjusted R-squared:  0.2373 
## F-statistic: 11.98 on 7 and 240 DF,  p-value: 4.305e-13
Anova(Best.gamma.zoop, type=2) # conductivity is most important, p=0.025, R2=0.016
## Anova Table (Type II tests)
## 
## Response: Gamma.rel_abund.sum
##             Sum Sq  Df F value    Pr(>F)    
## latitude    2.9487   1 74.3927 8.859e-16 ***
## elevation_m 1.8985   1 47.8970 4.066e-11 ***
## temp_C      0.0962   1  2.4272   0.12056    
## chla_ug.L   0.2278   1  5.7461   0.01729 *  
## pH          0.0940   1  2.3726   0.12480    
## DO_perc     1.0324   1 26.0453 6.784e-07 ***
## cond        0.9697   1 24.4634 1.425e-06 ***
## Residuals   9.5130 240                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#latitude
summary(lm(Gamma.rel_abund.sum ~ latitude , data = gamma.zoop)) # R2=0.073, p<0.001
## 
## Call:
## lm(formula = Gamma.rel_abund.sum ~ latitude, data = gamma.zoop)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.60806 -0.16426 -0.01761  0.17997  0.44483 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.95130    1.85244   4.832 2.38e-06 ***
## latitude    -0.22125    0.04901  -4.514 9.85e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2195 on 246 degrees of freedom
## Multiple R-squared:  0.07651,    Adjusted R-squared:  0.07275 
## F-statistic: 20.38 on 1 and 246 DF,  p-value: 9.85e-06

#conductivity 
summary(lm(Gamma.rel_abund.sum ~ cond , data = gamma.zoop)) #signif, R2=0.016, p=0.025
## 
## Call:
## lm(formula = Gamma.rel_abund.sum ~ cond, data = gamma.zoop)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.57201 -0.17758 -0.00995  0.20588  0.40238 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.6445369  0.0284941  22.620   <2e-16 ***
## cond        -0.0007366  0.0003255  -2.263   0.0245 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2261 on 246 degrees of freedom
## Multiple R-squared:  0.02039,    Adjusted R-squared:  0.01641 
## F-statistic: 5.121 on 1 and 246 DF,  p-value: 0.02451

#elevation NS
summary(lm(Gamma.rel_abund.sum ~ elevation_m , data = gamma.zoop)) #NS
## 
## Call:
## lm(formula = Gamma.rel_abund.sum ~ elevation_m, data = gamma.zoop)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.53881 -0.17695 -0.00659  0.19804  0.38519 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 4.059e-01  1.568e-01   2.588   0.0102 *
## elevation_m 6.174e-05  5.271e-05   1.171   0.2427  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2278 on 246 degrees of freedom
## Multiple R-squared:  0.005545,   Adjusted R-squared:  0.001503 
## F-statistic: 1.372 on 1 and 246 DF,  p-value: 0.2427

#chla NS
summary(lm(Gamma.rel_abund.sum ~ chla_ug.L , data = gamma.zoop)) #NS
## 
## Call:
## lm(formula = Gamma.rel_abund.sum ~ chla_ug.L, data = gamma.zoop)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.56136 -0.17127 -0.00841  0.19386  0.40443 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.56209    0.02523  22.282   <2e-16 ***
## chla_ug.L    0.03824    0.02956   1.294    0.197    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2277 on 246 degrees of freedom
## Multiple R-squared:  0.006757,   Adjusted R-squared:  0.002719 
## F-statistic: 1.674 on 1 and 246 DF,  p-value: 0.197

#DO NS
summary(lm(Gamma.rel_abund.sum ~ DO_perc , data = gamma.zoop)) #NS
## 
## Call:
## lm(formula = Gamma.rel_abund.sum ~ DO_perc, data = gamma.zoop)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.54802 -0.17426 -0.01188  0.20164  0.39933 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.852710   0.231465   3.684 0.000282 ***
## DO_perc     -0.003536   0.003096  -1.142 0.254465    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2278 on 246 degrees of freedom
## Multiple R-squared:  0.005276,   Adjusted R-squared:  0.001232 
## F-statistic: 1.305 on 1 and 246 DF,  p-value: 0.2545

### plots for gamma with by sample types
gamma_cond <- ggplot(gammaproteobacteria, aes(x=cond, y=Gamma.rel_abund.sum, 
                                              color=sample_type, fill=sample_type, linetrype=sample_type)) +
  geom_smooth(method = lm, alpha=0.2)+
  scale_linetype_manual(values = c("dashed", "solid"))+
  scale_color_manual(values = c("dodgerblue", "coral"))+
  scale_fill_manual(values = c("dodgerblue3", "firebrick"))+
  geom_point(size=1.5, alpha=0.6)+
  ylim(c(0,1))+
  xlab("Conductivity(uS/cm)") + ylab("Relative Abundance") +
  theme(axis.text.x=element_text(size=14)) + theme(axis.title.x=element_text(size=16)) +
  theme(axis.text.y=element_text(size=14)) + theme(axis.title.y=element_text(size=16))+
  ggtitle("Gammaproteobacteria") + theme_classic()+
  annotate(geom="text", x=60, y=1.0, label="Zoop. Adj-R2=0.025, p=0.025", size=2, color="black") +
  annotate(geom="text", x=60, y=0.1, label="Water Adj-R2=0.436, p<0.001", size=2, color="black")
gamma_cond

# latitude
gamma_lat <- ggplot(gammaproteobacteria, aes(x=latitude, y=Gamma.rel_abund.sum, 
                                        color=sample_type, fill=sample_type)) +
  geom_smooth(method = lm, alpha=0.2)+
  geom_point(size=1.5, alpha=0.6)+
  scale_color_manual(values = c("dodgerblue", "coral"))+
  scale_fill_manual(values = c("dodgerblue3", "firebrick"))+
  ylim(c(0,1))+
  xlab("Latitude (o N)") + ylab("Relative Abundance") +
  theme(axis.text.x=element_text(size=14)) + theme(axis.title.x=element_text(size=16)) +
  theme(axis.text.y=element_text(size=14)) + theme(axis.title.y=element_text(size=16))+
  ggtitle("Gammaproteobacteria") + theme_classic()+
  annotate(geom="text", x=37.8, y=0.0, label="Water Adj-R2=0.220, p=0.007", size=2, color="black")+ 
  annotate(geom="text", x=37.8, y=0.5, label="Zoop Adj-R2=0.073, p<0.001", size=2, color="black")
gamma_lat


########### ALPHAPROTEOBACTERIA
alphaproteobacteria <- subset(Abund.final, Class == "Alphaproteobacteria")
alphaproteobacteria <- summaryBy(rel_abund ~ sampleNames, data = alphaproteobacteria, FUN = sum)
alphaproteobacteria <- data.frame(alphaproteobacteria, row.names = alphaproteobacteria[,1])
names(alphaproteobacteria)[names(alphaproteobacteria) == 'rel_abund.sum'] <- 'Alphapr.rel_abund.sum'
alphaproteobacteria <- merge(alphaproteobacteria, sample_data(PS.500), by = "row.names")
names(alphaproteobacteria)[names(alphaproteobacteria) == 'sampleNames.x'] <- 'sampleNames'

#run a model
Alpha.model<-lm(Alphapr.rel_abund.sum ~ time_point + lake + phy_group, data = alphaproteobacteria)
Anova(Alpha.model, type=2)
## Anova Table (Type II tests)
## 
## Response: Alphapr.rel_abund.sum
##            Sum Sq  Df F value    Pr(>F)    
## time_point 0.3922   4  3.4780  0.009843 ** 
## lake       0.8814   5  6.2539 3.156e-05 ***
## phy_group  0.4484   2  7.9536  0.000553 ***
## Residuals  3.6362 129                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(allEffects(Alpha.model))
# time, lake, phy.group all matter

# separate sample types
alph.wat<-alphaproteobacteria[(alphaproteobacteria$sample_type=="Water"),]
alph.zoop<-alphaproteobacteria[(alphaproteobacteria$sample_type=="Zooplankton"),]

# testing environmental effects on alphaproteobacteria abundances
FitAll.alpha.wat<- lm(Alphapr.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + chla_ug.L + 
                         pH + DO_perc + cond, data = alph.wat)
stepAIC(FitAll.alpha.wat, direction = "both") # best is temp_C + DO_perc + cond 
## Start:  AIC=-137.47
## Alphapr.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + 
##     chla_ug.L + pH + DO_perc + cond
## 
##               Df Sum of Sq      RSS     AIC
## - pH           1 0.0000828 0.026759 -139.40
## - log(DOC)     1 0.0004503 0.027127 -139.08
## - cond         1 0.0005574 0.027234 -138.99
## - chla_ug.L    1 0.0011838 0.027860 -138.47
## - latitude     1 0.0014704 0.028147 -138.23
## - elevation_m  1 0.0015287 0.028205 -138.19
## - temp_C       1 0.0018697 0.028546 -137.91
## <none>                     0.026677 -137.47
## - DO_perc      1 0.0035723 0.030249 -136.58
## 
## Step:  AIC=-139.4
## Alphapr.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + 
##     chla_ug.L + DO_perc + cond
## 
##               Df Sum of Sq      RSS     AIC
## - log(DOC)     1 0.0004042 0.027164 -141.05
## - cond         1 0.0005701 0.027329 -140.91
## - chla_ug.L    1 0.0011279 0.027887 -140.45
## - elevation_m  1 0.0014572 0.028217 -140.18
## - latitude     1 0.0014675 0.028227 -140.17
## <none>                     0.026759 -139.40
## - temp_C       1 0.0035771 0.030336 -138.51
## - DO_perc      1 0.0036839 0.030443 -138.43
## + pH           1 0.0000828 0.026677 -137.47
## 
## Step:  AIC=-141.05
## Alphapr.rel_abund.sum ~ latitude + elevation_m + temp_C + chla_ug.L + 
##     DO_perc + cond
## 
##               Df Sum of Sq      RSS     AIC
## - cond         1 0.0002778 0.027441 -142.82
## - chla_ug.L    1 0.0007324 0.027896 -142.44
## - latitude     1 0.0015558 0.028719 -141.77
## - elevation_m  1 0.0020623 0.029226 -141.37
## <none>                     0.027164 -141.05
## - temp_C       1 0.0031729 0.030336 -140.51
## - DO_perc      1 0.0042160 0.031380 -139.73
## + log(DOC)     1 0.0004042 0.026759 -139.40
## + pH           1 0.0000367 0.027127 -139.08
## 
## Step:  AIC=-142.82
## Alphapr.rel_abund.sum ~ latitude + elevation_m + temp_C + chla_ug.L + 
##     DO_perc
## 
##               Df Sum of Sq      RSS     AIC
## - chla_ug.L    1 0.0007970 0.028238 -144.16
## <none>                     0.027441 -142.82
## - temp_C       1 0.0036828 0.031124 -141.92
## - DO_perc      1 0.0042939 0.031735 -141.47
## + cond         1 0.0002778 0.027164 -141.05
## + log(DOC)     1 0.0001118 0.027330 -140.91
## + pH           1 0.0000602 0.027381 -140.87
## - latitude     1 0.0068356 0.034277 -139.70
## - elevation_m  1 0.0191474 0.046589 -132.64
## 
## Step:  AIC=-144.16
## Alphapr.rel_abund.sum ~ latitude + elevation_m + temp_C + DO_perc
## 
##               Df Sum of Sq      RSS     AIC
## <none>                     0.028238 -144.16
## - temp_C       1 0.0038478 0.032086 -143.22
## + chla_ug.L    1 0.0007970 0.027441 -142.82
## + cond         1 0.0003423 0.027896 -142.44
## + pH           1 0.0000449 0.028193 -142.20
## + log(DOC)     1 0.0000254 0.028213 -142.18
## - DO_perc      1 0.0069143 0.035153 -141.12
## - latitude     1 0.0074952 0.035734 -140.75
## - elevation_m  1 0.0236125 0.051851 -132.18
## 
## Call:
## lm(formula = Alphapr.rel_abund.sum ~ latitude + elevation_m + 
##     temp_C + DO_perc, data = alph.wat)
## 
## Coefficients:
## (Intercept)     latitude  elevation_m       temp_C      DO_perc  
##  -2.6204336    0.0921104   -0.0001452    0.0065497   -0.0057536

# best model
Best.alpha.wat<-lm(Alphapr.rel_abund.sum ~  latitude + elevation_m + temp_C + DO_perc, data = alph.wat)
summary(Best.alpha.wat)
## 
## Call:
## lm(formula = Alphapr.rel_abund.sum ~ latitude + elevation_m + 
##     temp_C + DO_perc, data = alph.wat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.086680 -0.018801  0.005336  0.018574  0.070326 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -2.620e+00  1.420e+00  -1.845   0.0816 . 
## latitude     9.211e-02  4.214e-02   2.186   0.0423 * 
## elevation_m -1.452e-04  3.743e-05  -3.880   0.0011 **
## temp_C       6.550e-03  4.182e-03   1.566   0.1347   
## DO_perc     -5.754e-03  2.741e-03  -2.099   0.0502 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03961 on 18 degrees of freedom
## Multiple R-squared:  0.5666, Adjusted R-squared:  0.4703 
## F-statistic: 5.883 on 4 and 18 DF,  p-value: 0.00329
Anova(Best.alpha.wat, type=2) # 
## Anova Table (Type II tests)
## 
## Response: Alphapr.rel_abund.sum
##                Sum Sq Df F value   Pr(>F)   
## latitude    0.0074952  1  4.7777 0.042288 * 
## elevation_m 0.0236125  1 15.0513 0.001098 **
## temp_C      0.0038478  1  2.4527 0.134736   
## DO_perc     0.0069143  1  4.4074 0.050152 . 
## Residuals   0.0282384 18                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(allEffects(Best.alpha.wat))

#elevation
summary(lm(Alphapr.rel_abund.sum ~ elevation_m , data = alph.wat)) # R2=0.284, #p=0.005
## 
## Call:
## lm(formula = Alphapr.rel_abund.sum ~ elevation_m, data = alph.wat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.071040 -0.028343 -0.006689  0.018396  0.116960 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.855e-01  8.849e-02   4.357 0.000277 ***
## elevation_m -9.438e-05  3.027e-05  -3.117 0.005210 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04606 on 21 degrees of freedom
## Multiple R-squared:  0.3164, Adjusted R-squared:  0.2838 
## F-statistic: 9.718 on 1 and 21 DF,  p-value: 0.00521

#cond
summary(lm(Alphapr.rel_abund.sum ~ cond , data = alph.wat)) #R2=0.450, #p=0.009,
## 
## Call:
## lm(formula = Alphapr.rel_abund.sum ~ cond, data = alph.wat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.060495 -0.024607 -0.007579  0.016369  0.133891 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 0.0596024  0.0204368   2.916  0.00825 **
## cond        0.0006474  0.0002244   2.885  0.00885 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04714 on 21 degrees of freedom
## Multiple R-squared:  0.2839, Adjusted R-squared:  0.2498 
## F-statistic: 8.326 on 1 and 21 DF,  p-value: 0.008851

#Temp
summary(lm(Alphapr.rel_abund.sum ~ temp_C , data = alph.wat)) # R2=0.145, #p=0.041
## 
## Call:
## lm(formula = Alphapr.rel_abund.sum ~ temp_C, data = alph.wat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.085457 -0.020715 -0.004945  0.022312  0.128201 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.059162   0.079014  -0.749   0.4623  
## temp_C       0.010285   0.004725   2.177   0.0411 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05031 on 21 degrees of freedom
## Multiple R-squared:  0.1841, Adjusted R-squared:  0.1452 
## F-statistic: 4.738 on 1 and 21 DF,  p-value: 0.04106

#latitude
summary(lm(Alphapr.rel_abund.sum ~ latitude , data = alph.wat)) #NS
## 
## Call:
## lm(formula = Alphapr.rel_abund.sum ~ latitude, data = alph.wat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.05840 -0.03345 -0.01050  0.01813  0.16980 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.66106    1.58255   0.418    0.680
## latitude    -0.01455    0.04188  -0.347    0.732
## 
## Residual standard error: 0.05554 on 21 degrees of freedom
## Multiple R-squared:  0.005714,   Adjusted R-squared:  -0.04163 
## F-statistic: 0.1207 on 1 and 21 DF,  p-value: 0.7318

#DO 
summary(lm(Alphapr.rel_abund.sum ~ DO_perc , data = alph.wat)) #NS
## 
## Call:
## lm(formula = Alphapr.rel_abund.sum ~ DO_perc, data = alph.wat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.059424 -0.032388 -0.005031  0.016735  0.172761 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.094e-01  1.808e-01   0.605    0.552
## DO_perc     2.503e-05  2.383e-03   0.011    0.992
## 
## Residual standard error: 0.0557 on 21 degrees of freedom
## Multiple R-squared:  5.254e-06,  Adjusted R-squared:  -0.04761 
## F-statistic: 0.0001103 on 1 and 21 DF,  p-value: 0.9917


##### now zoops
# testing environmental effects on alphaproteobacteria abundances
FitAll.alpha.zoop<- lm(Alphapr.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + chla_ug.L + 
                         pH + DO_perc + cond, data = alph.zoop)
stepAIC(FitAll.alpha.zoop, direction = "both") # DO_perc + cond best
## Start:  AIC=-371.04
## Alphapr.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + 
##     chla_ug.L + pH + DO_perc + cond
## 
##               Df Sum of Sq    RSS     AIC
## - elevation_m  1  0.000316 4.3659 -373.03
## - pH           1  0.000545 4.3661 -373.02
## - latitude     1  0.004957 4.3706 -372.90
## - log(DOC)     1  0.008261 4.3739 -372.81
## - chla_ug.L    1  0.010560 4.3762 -372.75
## - cond         1  0.043108 4.4087 -371.88
## - temp_C       1  0.045835 4.4114 -371.81
## <none>                     4.3656 -371.04
## - DO_perc      1  0.114929 4.4805 -369.97
## 
## Step:  AIC=-373.03
## Alphapr.rel_abund.sum ~ latitude + temp_C + log(DOC) + chla_ug.L + 
##     pH + DO_perc + cond
## 
##               Df Sum of Sq    RSS     AIC
## - pH           1  0.000706 4.3666 -375.01
## - log(DOC)     1  0.008208 4.3741 -374.81
## - chla_ug.L    1  0.010951 4.3769 -374.73
## - latitude     1  0.019517 4.3854 -374.50
## - temp_C       1  0.047068 4.4130 -373.76
## <none>                     4.3659 -373.03
## - cond         1  0.135605 4.5015 -371.42
## + elevation_m  1  0.000316 4.3656 -371.04
## - DO_perc      1  0.166853 4.5328 -370.60
## 
## Step:  AIC=-375.01
## Alphapr.rel_abund.sum ~ latitude + temp_C + log(DOC) + chla_ug.L + 
##     DO_perc + cond
## 
##               Df Sum of Sq    RSS     AIC
## - log(DOC)     1  0.007533 4.3742 -376.81
## - chla_ug.L    1  0.011124 4.3777 -376.71
## - latitude     1  0.021319 4.3879 -376.44
## <none>                     4.3666 -375.01
## - temp_C       1  0.076307 4.4429 -374.97
## + pH           1  0.000706 4.3659 -373.03
## + elevation_m  1  0.000477 4.3661 -373.02
## - cond         1  0.153398 4.5200 -372.94
## - DO_perc      1  0.192631 4.5593 -371.92
## 
## Step:  AIC=-376.81
## Alphapr.rel_abund.sum ~ latitude + temp_C + chla_ug.L + DO_perc + 
##     cond
## 
##               Df Sum of Sq    RSS     AIC
## - chla_ug.L    1  0.018774 4.3929 -378.30
## - latitude     1  0.020875 4.3950 -378.24
## - temp_C       1  0.073022 4.4472 -376.85
## <none>                     4.3742 -376.81
## + log(DOC)     1  0.007533 4.3666 -375.01
## + elevation_m  1  0.000210 4.3739 -374.81
## + pH           1  0.000030 4.3741 -374.81
## - DO_perc      1  0.185121 4.5593 -373.92
## - cond         1  0.241303 4.6155 -372.47
## 
## Step:  AIC=-378.3
## Alphapr.rel_abund.sum ~ latitude + temp_C + DO_perc + cond
## 
##               Df Sum of Sq    RSS     AIC
## - temp_C       1  0.058239 4.4512 -378.75
## - latitude     1  0.059551 4.4525 -378.71
## <none>                     4.3929 -378.30
## + chla_ug.L    1  0.018774 4.3742 -376.81
## + log(DOC)     1  0.015183 4.3777 -376.71
## + elevation_m  1  0.000382 4.3925 -376.31
## + pH           1  0.000000 4.3929 -376.30
## - DO_perc      1  0.181460 4.5744 -375.52
## - cond         1  0.239935 4.6329 -374.03
## 
## Step:  AIC=-378.75
## Alphapr.rel_abund.sum ~ latitude + DO_perc + cond
## 
##               Df Sum of Sq    RSS     AIC
## - latitude     1  0.036590 4.4878 -379.78
## <none>                     4.4512 -378.75
## + temp_C       1  0.058239 4.3929 -378.30
## + pH           1  0.018844 4.4323 -377.25
## + log(DOC)     1  0.006747 4.4444 -376.93
## + chla_ug.L    1  0.003990 4.4472 -376.85
## + elevation_m  1  0.003301 4.4479 -376.83
## - cond         1  0.183524 4.6347 -375.98
## - DO_perc      1  0.265381 4.7165 -373.91
## 
## Step:  AIC=-379.78
## Alphapr.rel_abund.sum ~ DO_perc + cond
## 
##               Df Sum of Sq    RSS     AIC
## <none>                     4.4878 -379.78
## + latitude     1  0.036590 4.4512 -378.75
## + temp_C       1  0.035278 4.4525 -378.71
## + chla_ug.L    1  0.024218 4.4635 -378.42
## + pH           1  0.018756 4.4690 -378.28
## + elevation_m  1  0.015798 4.4720 -378.20
## + log(DOC)     1  0.013570 4.4742 -378.14
## - cond         1  0.152796 4.6406 -377.83
## - DO_perc      1  0.230663 4.7184 -375.87
## 
## Call:
## lm(formula = Alphapr.rel_abund.sum ~ DO_perc + cond, data = alph.zoop)
## 
## Coefficients:
## (Intercept)      DO_perc         cond  
##     1.18502     -0.01113     -0.00108

# best model
Best.alpha.zoop<-lm(Alphapr.rel_abund.sum ~  DO_perc + cond, data = alph.zoop)
summary(Best.alpha.zoop)
## 
## Call:
## lm(formula = Alphapr.rel_abund.sum ~ DO_perc + cond, data = alph.zoop)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.36129 -0.13475 -0.05942  0.09014  0.72804 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.1850179  0.3208201   3.694  0.00034 ***
## DO_perc     -0.0111290  0.0045776  -2.431  0.01659 *  
## cond        -0.0010803  0.0005459  -1.979  0.05023 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1975 on 115 degrees of freedom
## Multiple R-squared:  0.1476, Adjusted R-squared:  0.1328 
## F-statistic: 9.959 on 2 and 115 DF,  p-value: 0.0001026
Anova(Best.alpha.zoop, type=2) # DO is most important, cond less so.
## Anova Table (Type II tests)
## 
## Response: Alphapr.rel_abund.sum
##           Sum Sq  Df F value  Pr(>F)  
## DO_perc   0.2307   1  5.9108 0.01659 *
## cond      0.1528   1  3.9154 0.05023 .
## Residuals 4.4878 115                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(allEffects(Best.alpha.zoop))

summary(lm(Alphapr.rel_abund.sum ~ DO_perc , data = alph.zoop)) #p=0.001, R2=0.111
## 
## Call:
## lm(formula = Alphapr.rel_abund.sum ~ DO_perc, data = alph.zoop)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3221 -0.1407 -0.0635  0.1096  0.7227 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.440013   0.297470   4.841 4.02e-06 ***
## DO_perc     -0.015750   0.003986  -3.951 0.000134 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2 on 116 degrees of freedom
## Multiple R-squared:  0.1186, Adjusted R-squared:  0.111 
## F-statistic: 15.61 on 1 and 116 DF,  p-value: 0.0001341
summary(lm(Alphapr.rel_abund.sum ~ cond , data = alph.zoop)) #p<0.001, R2=0.096
## 
## Call:
## lm(formula = Alphapr.rel_abund.sum ~ cond, data = alph.zoop)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32677 -0.13711 -0.06954  0.11311  0.72219 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.4120099  0.0437023   9.428 5.21e-16 ***
## cond        -0.0017573  0.0004794  -3.666 0.000373 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2017 on 116 degrees of freedom
## Multiple R-squared:  0.1038, Adjusted R-squared:  0.09609 
## F-statistic: 13.44 on 1 and 116 DF,  p-value: 0.0003734
summary(lm(Alphapr.rel_abund.sum ~ elevation_m , data = alph.zoop)) #p=0.007, R2=0.053
## 
## Call:
## lm(formula = Alphapr.rel_abund.sum ~ elevation_m, data = alph.zoop)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.24703 -0.16118 -0.06126  0.10021  0.70034 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -3.968e-01  2.431e-01  -1.632  0.10529   
## elevation_m  2.225e-04  8.121e-05   2.739  0.00713 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2065 on 116 degrees of freedom
## Multiple R-squared:  0.06076,    Adjusted R-squared:  0.05266 
## F-statistic: 7.503 on 1 and 116 DF,  p-value: 0.007131

# plot it
# DO important for zoops
alpha_DO<-ggplot(alphaproteobacteria, aes(x=DO_perc, y=Alphapr.rel_abund.sum, color=sample_type, fill=sample_type)) +
  geom_smooth(method = lm, alpha=0.2)+
  scale_color_manual(values = c("dodgerblue", "coral"))+
  scale_fill_manual(values = c("dodgerblue3", "firebrick"))+
  geom_point(size=1.5, alpha=0.6)+
  ylim(c(0,1))+
  xlab("Dissolved Oxygen (%)") + ylab("Relative Abundance") +
  theme(axis.text.x=element_text(size=14)) + theme(axis.title.x=element_text(size=16)) +
  theme(axis.text.y=element_text(size=14)) + theme(axis.title.y=element_text(size=16)) +
  ggtitle("Alphaproteobacteria") + theme_classic() +
  annotate(geom="text", x=75, y=0.75, label="Zoop Adj-R2=0.111, p=0.001", size=2,
           color="black") 
alpha_DO

# condutivity
alpha_cond <-ggplot(alphaproteobacteria, aes(x=cond, y=Alphapr.rel_abund.sum, color=sample_type, fill=sample_type)) +
  geom_smooth(method = lm, alpha=0.2)+
  scale_color_manual(values = c("dodgerblue", "coral"))+
  scale_fill_manual(values = c("dodgerblue3", "firebrick"))+
  geom_point(size=1.5, alpha=0.6)+
  ylim(c(0,1))+
  xlab("Conductivity(uS/cm)") + ylab("Relative Abundance") +
  theme(axis.text.x=element_text(size=14)) + theme(axis.title.x=element_text(size=16)) +
  theme(axis.text.y=element_text(size=14)) + theme(axis.title.y=element_text(size=16)) +
  ggtitle("Alphaproteobacteria") + theme_classic() +
  annotate(geom="text", x=75, y=0.75, label="Zoop Adj-R2=0.096, p<0.001", size=2,
           color="black") +
  annotate(geom="text", x=75, y=0, label="Water Adj-R2=0.250, p=0.009", size=2,
           color="black")
alpha_cond

# elevation
alpha_elev <-ggplot(alphaproteobacteria, aes(x=elevation_m, y=Alphapr.rel_abund.sum, color=sample_type, fill=sample_type)) +
  geom_smooth(method = lm, alpha=0.2)+
  scale_color_manual(values = c("dodgerblue", "coral"))+
  scale_fill_manual(values = c("dodgerblue3", "firebrick"))+
  geom_point(size=1.5, alpha=0.6)+
  ylim(c(0,1))+
  xlab("Elevation (m)") + ylab("Relative Abundance") +
  theme(axis.text.x=element_text(size=14)) + theme(axis.title.x=element_text(size=16)) +
  theme(axis.text.y=element_text(size=14)) + theme(axis.title.y=element_text(size=16)) +
  ggtitle("Alphaproteobacteria") + theme_classic() +
  annotate(geom="text", x=2700, y=0.75, label="Zoop Adj-R2=0.053, p=0.007", size=2,
           color="black") +
  annotate(geom="text", x=2700, y=0.5, label="Water Adj-R2=0.284, p=0.005", size=2, 
           color="black")
alpha_elev


##### #ACTINOBACTERIA
actinobacteria <- subset(Abund.final, Class == "Actinobacteria")
actinobacteria <- summaryBy(rel_abund ~ sampleNames, data = actinobacteria, FUN = sum)
actinobacteria <- data.frame(actinobacteria, row.names = actinobacteria[,1])
names(actinobacteria)[names(actinobacteria) == 'rel_abund.sum'] <- 'Actinob.rel_abund.sum'
actinobacteria <- merge(actinobacteria, sample_data(PS.500), by = "row.names")
names(actinobacteria)[names(actinobacteria) == 'sampleNames.x'] <- 'sampleNames'

#run a model
Actin.model<-lm(Actinob.rel_abund.sum ~ time_point + lake + phy_group, data = actinobacteria)
Anova(Actin.model)
## Anova Table (Type II tests)
## 
## Response: Actinob.rel_abund.sum
##              Sum Sq Df F value   Pr(>F)    
## time_point 0.025020  4  1.7941 0.167752    
## lake       0.067772  5  3.8879 0.011894 *  
## phy_group  0.094220  2 13.5127 0.000169 ***
## Residuals  0.073213 21                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(allEffects(Actin.model))
# lake, phy.group all matter

# separate sample types
act.wat<-actinobacteria[(actinobacteria$sample_type=="Water"),]
act.zoop<-actinobacteria[(actinobacteria$sample_type=="Zooplankton"),]

# testing environmental effects on alphaproteobacteria abundances
### WATER first
FitAll.actin.wat<- lm(Actinob.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + chla_ug.L + 
                         pH + DO_perc + cond,  data = act.wat)
stepAIC(FitAll.actin.wat, direction = "both") # cond best model
## Start:  AIC=-130.98
## Actinob.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + 
##     chla_ug.L + pH + DO_perc + cond
## 
##               Df  Sum of Sq      RSS     AIC
## - pH           1 0.00000034 0.084412 -132.98
## - elevation_m  1 0.00002884 0.084441 -132.97
## - latitude     1 0.00002887 0.084441 -132.97
## - chla_ug.L    1 0.00015801 0.084570 -132.94
## - DO_perc      1 0.00041062 0.084822 -132.86
## - temp_C       1 0.00042635 0.084838 -132.85
## - log(DOC)     1 0.00122028 0.085632 -132.61
## - cond         1 0.00271354 0.087125 -132.16
## <none>                      0.084412 -130.98
## 
## Step:  AIC=-132.98
## Actinob.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + 
##     chla_ug.L + DO_perc + cond
## 
##               Df  Sum of Sq      RSS     AIC
## - elevation_m  1 0.00002978 0.084442 -134.97
## - latitude     1 0.00003114 0.084443 -134.97
## - chla_ug.L    1 0.00016017 0.084572 -134.93
## - DO_perc      1 0.00044106 0.084853 -134.85
## - temp_C       1 0.00084867 0.085261 -134.72
## - log(DOC)     1 0.00123447 0.085647 -134.61
## - cond         1 0.00276521 0.087177 -134.15
## <none>                      0.084412 -132.98
## + pH           1 0.00000034 0.084412 -130.98
## 
## Step:  AIC=-134.97
## Actinob.rel_abund.sum ~ latitude + temp_C + log(DOC) + chla_ug.L + 
##     DO_perc + cond
## 
##               Df Sum of Sq      RSS     AIC
## - chla_ug.L    1 0.0001618 0.084604 -136.93
## - DO_perc      1 0.0004131 0.084855 -136.85
## - latitude     1 0.0007143 0.085156 -136.75
## - temp_C       1 0.0008192 0.085261 -136.72
## - log(DOC)     1 0.0016328 0.086075 -136.48
## <none>                     0.084442 -134.97
## + elevation_m  1 0.0000298 0.084412 -132.98
## + pH           1 0.0000013 0.084441 -132.97
## - cond         1 0.0190458 0.103488 -131.69
## 
## Step:  AIC=-136.92
## Actinob.rel_abund.sum ~ latitude + temp_C + log(DOC) + DO_perc + 
##     cond
## 
##               Df Sum of Sq      RSS     AIC
## - DO_perc      1 0.0003093 0.084913 -138.83
## - latitude     1 0.0005844 0.085188 -138.75
## - temp_C       1 0.0010529 0.085657 -138.60
## - log(DOC)     1 0.0020719 0.086676 -138.30
## <none>                     0.084604 -136.93
## + chla_ug.L    1 0.0001618 0.084442 -134.97
## + elevation_m  1 0.0000314 0.084572 -134.93
## + pH           1 0.0000000 0.084604 -134.93
## - cond         1 0.0225496 0.107153 -132.78
## 
## Step:  AIC=-138.83
## Actinob.rel_abund.sum ~ latitude + temp_C + log(DOC) + cond
## 
##               Df Sum of Sq      RSS     AIC
## - temp_C       1  0.000859 0.085772 -140.57
## - latitude     1  0.001157 0.086070 -140.48
## - log(DOC)     1  0.001799 0.086712 -140.28
## <none>                     0.084913 -138.83
## + DO_perc      1  0.000309 0.084604 -136.93
## + chla_ug.L    1  0.000058 0.084855 -136.85
## + pH           1  0.000017 0.084896 -136.84
## + elevation_m  1  0.000000 0.084913 -136.83
## - cond         1  0.047097 0.132010 -129.36
## 
## Step:  AIC=-140.57
## Actinob.rel_abund.sum ~ latitude + log(DOC) + cond
## 
##               Df Sum of Sq      RSS     AIC
## - latitude     1  0.000898 0.086670 -142.30
## - log(DOC)     1  0.002716 0.088487 -141.76
## <none>                     0.085772 -140.57
## + temp_C       1  0.000859 0.084913 -138.83
## + pH           1  0.000513 0.085259 -138.72
## + chla_ug.L    1  0.000243 0.085529 -138.64
## + DO_perc      1  0.000115 0.085657 -138.60
## + elevation_m  1  0.000020 0.085752 -138.57
## - cond         1  0.050180 0.135952 -130.59
## 
## Step:  AIC=-142.3
## Actinob.rel_abund.sum ~ log(DOC) + cond
## 
##               Df Sum of Sq      RSS     AIC
## - log(DOC)     1  0.002559 0.089229 -143.54
## <none>                     0.086670 -142.30
## + latitude     1  0.000898 0.085772 -140.57
## + pH           1  0.000791 0.085879 -140.54
## + elevation_m  1  0.000668 0.086002 -140.50
## + temp_C       1  0.000600 0.086070 -140.48
## + DO_perc      1  0.000498 0.086172 -140.45
## + chla_ug.L    1  0.000004 0.086665 -140.30
## - cond         1  0.073421 0.160091 -128.34
## 
## Step:  AIC=-143.54
## Actinob.rel_abund.sum ~ cond
## 
##               Df Sum of Sq      RSS     AIC
## <none>                     0.089229 -143.54
## + log(DOC)     1  0.002559 0.086670 -142.30
## + pH           1  0.001770 0.087458 -142.06
## + temp_C       1  0.001390 0.087838 -141.95
## + elevation_m  1  0.001079 0.088149 -141.86
## + latitude     1  0.000741 0.088487 -141.76
## + chla_ug.L    1  0.000450 0.088779 -141.67
## + DO_perc      1  0.000033 0.089196 -141.55
## - cond         1  0.073600 0.162828 -129.90
## 
## Call:
## lm(formula = Actinob.rel_abund.sum ~ cond, data = act.wat)
## 
## Coefficients:
## (Intercept)         cond  
##    0.288878    -0.001128

# best model
Best.actin<-lm(Actinob.rel_abund.sum ~ cond, data = act.wat)
Anova(Best.actin)
## Anova Table (Type II tests)
## 
## Response: Actinob.rel_abund.sum
##             Sum Sq Df F value    Pr(>F)    
## cond      0.073600  1  19.796 0.0001685 ***
## Residuals 0.089229 24                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(Best.actin) # cond is most important (p<0.001, R2=0.429)  
## 
## Call:
## lm(formula = Actinob.rel_abund.sum ~ cond, data = act.wat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.089447 -0.041391 -0.002101  0.032377  0.162297 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.2888776  0.0214845  13.446 1.15e-12 ***
## cond        -0.0011279  0.0002535  -4.449 0.000168 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06097 on 24 degrees of freedom
## Multiple R-squared:  0.452,  Adjusted R-squared:  0.4292 
## F-statistic:  19.8 on 1 and 24 DF,  p-value: 0.0001685
plot(allEffects(Best.actin))

#elevation
summary(lm(Actinob.rel_abund.sum ~ elevation_m, data = act.wat)) #R2=0.216, p=0.010
## 
## Call:
## lm(formula = Actinob.rel_abund.sum ~ elevation_m, data = act.wat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.13821 -0.04843 -0.01485  0.03996  0.19877 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -1.635e-01  1.336e-01  -1.224  0.23286   
## elevation_m  1.270e-04  4.525e-05   2.807  0.00977 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07147 on 24 degrees of freedom
## Multiple R-squared:  0.2472, Adjusted R-squared:  0.2158 
## F-statistic: 7.879 on 1 and 24 DF,  p-value: 0.009769

#cond
summary(lm(Actinob.rel_abund.sum ~ cond, data = act.wat)) #(p<0.001, R2=0.429)  
## 
## Call:
## lm(formula = Actinob.rel_abund.sum ~ cond, data = act.wat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.089447 -0.041391 -0.002101  0.032377  0.162297 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.2888776  0.0214845  13.446 1.15e-12 ***
## cond        -0.0011279  0.0002535  -4.449 0.000168 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06097 on 24 degrees of freedom
## Multiple R-squared:  0.452,  Adjusted R-squared:  0.4292 
## F-statistic:  19.8 on 1 and 24 DF,  p-value: 0.0001685

### now Zoops 
FitAll.actin.zoop<- lm(Actinob.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + chla_ug.L + 
                         pH + DO_perc + cond, data = act.zoop)
step(FitAll.actin.zoop, direction = "both") # no effects and no relationship
## Start:  AIC=-55.67
## Actinob.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + 
##     chla_ug.L + pH + DO_perc + cond
## 
## 
## Step:  AIC=-55.67
## Actinob.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + 
##     chla_ug.L + pH + DO_perc
## 
## 
## Step:  AIC=-55.67
## Actinob.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + 
##     chla_ug.L + pH
## 
## 
## Step:  AIC=-55.67
## Actinob.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + 
##     chla_ug.L
## 
## 
## Step:  AIC=-55.67
## Actinob.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC)
## 
## 
## Step:  AIC=-55.67
## Actinob.rel_abund.sum ~ latitude + elevation_m + temp_C
## 
## 
## Step:  AIC=-55.67
## Actinob.rel_abund.sum ~ latitude + elevation_m
## 
##               Df  Sum of Sq       RSS     AIC
## - elevation_m  1 1.4335e-06 0.0010468 -57.656
## - latitude     1 1.5953e-06 0.0010469 -57.655
## <none>                      0.0010453 -55.665
## 
## Step:  AIC=-57.66
## Actinob.rel_abund.sum ~ latitude
## 
##               Df Sum of Sq       RSS     AIC
## - latitude     1 5.609e-05 0.0011029 -59.290
## <none>                     0.0010468 -57.656
## + elevation_m  1 1.434e-06 0.0010453 -55.665
## + temp_C       1 1.434e-06 0.0010453 -55.665
## + log(DOC)     1 1.434e-06 0.0010453 -55.665
## + chla_ug.L    1 1.434e-06 0.0010453 -55.665
## + pH           1 1.434e-06 0.0010453 -55.665
## + DO_perc      1 1.434e-06 0.0010453 -55.665
## + cond         1 1.434e-06 0.0010453 -55.665
## 
## Step:  AIC=-59.29
## Actinob.rel_abund.sum ~ 1
## 
##               Df  Sum of Sq       RSS     AIC
## <none>                      0.0011029 -59.290
## + latitude     1 5.6090e-05 0.0010468 -57.656
## + elevation_m  1 5.5929e-05 0.0010469 -57.655
## + cond         1 5.5157e-05 0.0010477 -57.649
## + log(DOC)     1 5.4857e-05 0.0010480 -57.647
## + DO_perc      1 5.3939e-05 0.0010489 -57.641
## + pH           1 1.4301e-05 0.0010886 -57.382
## + chla_ug.L    1 3.9870e-06 0.0010989 -57.316
## + temp_C       1 3.0100e-07 0.0011026 -57.292
## 
## Call:
## lm(formula = Actinob.rel_abund.sum ~ 1, data = act.zoop)
## 
## Coefficients:
## (Intercept)  
##     0.07286
summary(lm(Actinob.rel_abund.sum ~ cond, data = act.zoop)) # for the plot
## 
## Call:
## lm(formula = Actinob.rel_abund.sum ~ cond, data = act.zoop)
## 
## Residuals:
##          1          2          3          6         25         32         33 
##  1.814e-05  3.559e-05 -1.196e-02  1.004e-02  1.996e-02  1.958e-03 -2.004e-02 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.513e-02  7.040e-03  10.672 0.000125 ***
## cond        -7.021e-05  1.368e-04  -0.513 0.629768    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01448 on 5 degrees of freedom
## Multiple R-squared:  0.05001,    Adjusted R-squared:  -0.14 
## F-statistic: 0.2632 on 1 and 5 DF,  p-value: 0.6298

anova(lm(Actinob.rel_abund.sum ~ 1, data = act.zoop)) #NA intercept fit
## Analysis of Variance Table
## 
## Response: Actinob.rel_abund.sum
##           Df    Sum Sq    Mean Sq F value Pr(>F)
## Residuals  6 0.0011029 0.00018381

# for plotting
summary(lm(Actinob.rel_abund.sum ~ cond, data = act.zoop)) #R2=-0.140, p=0.629
## 
## Call:
## lm(formula = Actinob.rel_abund.sum ~ cond, data = act.zoop)
## 
## Residuals:
##          1          2          3          6         25         32         33 
##  1.814e-05  3.559e-05 -1.196e-02  1.004e-02  1.996e-02  1.958e-03 -2.004e-02 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.513e-02  7.040e-03  10.672 0.000125 ***
## cond        -7.021e-05  1.368e-04  -0.513 0.629768    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01448 on 5 degrees of freedom
## Multiple R-squared:  0.05001,    Adjusted R-squared:  -0.14 
## F-statistic: 0.2632 on 1 and 5 DF,  p-value: 0.6298
summary(lm(Actinob.rel_abund.sum ~ elevation_m, data = act.zoop)) #R2=-0.139, p=0.627
## 
## Call:
## lm(formula = Actinob.rel_abund.sum ~ elevation_m, data = act.zoop)
## 
## Residuals:
##          1          2          3          6         25         32         33 
##  5.809e-05  1.420e-04 -1.186e-02  1.014e-02  1.984e-02  1.839e-03 -2.016e-02 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.329e-02  5.747e-02   0.753    0.485
## elevation_m 9.785e-06  1.893e-05   0.517    0.627
## 
## Residual standard error: 0.01447 on 5 degrees of freedom
## Multiple R-squared:  0.05071,    Adjusted R-squared:  -0.1391 
## F-statistic: 0.2671 on 1 and 5 DF,  p-value: 0.6273

# plot it
actino_cond <- ggplot(actinobacteria, aes(x=cond, y=Actinob.rel_abund.sum, color=sample_type, fill=sample_type)) +
  geom_smooth(method = lm, alpha=0.2)+
  scale_color_manual(values = c("dodgerblue", "coral"))+
  scale_fill_manual(values = c("dodgerblue3", "firebrick"))+
  geom_point(size=1.5, alpha=0.6)+
  ylim(c(0,1))+
  xlab("Conductivity a (uS/cm)") + ylab("Relative Abundance") +
  theme(axis.text.x=element_text(size=14)) + theme(axis.title.x=element_text(size=16)) +
  theme(axis.text.y=element_text(size=14)) + theme(axis.title.y=element_text(size=16)) +
  ggtitle("Actinobacteria") + theme_classic() +
   annotate(geom="text", x=70, y=0.50, label="Water Adj-R2=0.429, p<0.001", size=2,
           color="black") +
  annotate(geom="text", x=70, y=0.40, label="Zoop = NS", size=2,
           color="black")
actino_cond 

#elevation
actino_elev <- ggplot(actinobacteria, aes(x=elevation_m, y=Actinob.rel_abund.sum, color=sample_type, fill=sample_type)) +
  geom_smooth(method = lm, alpha=0.2)+
  scale_color_manual(values = c("dodgerblue", "coral"))+
  scale_fill_manual(values = c("dodgerblue3", "firebrick"))+
  geom_point(size=1.5, alpha=0.6)+
  ylim(c(0,1))+
  xlab("Elevation (m)") + ylab("Relative Abundance") +
  theme(axis.text.x=element_text(size=14)) + theme(axis.title.x=element_text(size=16)) +
  theme(axis.text.y=element_text(size=14)) + theme(axis.title.y=element_text(size=16)) +
  ggtitle("Actinobacteria") + theme_classic() +
   annotate(geom="text", x=2700, y=0.50, label="Water Adj-R2=0.216, p=0.010", size=2, 
           color="black")
actino_elev 


##### Bacteroidia
bacteroidia <- subset(Abund.final, Class == "Bacteroidia")
bacteroidia <- summaryBy(rel_abund ~ sampleNames, data = bacteroidia, FUN = sum)
bacteroidia <- data.frame(bacteroidia, row.names = bacteroidia[,1])
names(bacteroidia)[names(bacteroidia) == 'rel_abund.sum'] <- 'Bactero.rel_abund.sum'
bacteroidia <- merge(bacteroidia, sample_data(PS.500), by = "row.names")
names(bacteroidia)[names(bacteroidia) == 'sampleNames.x'] <- 'sampleNames'

#run a model
Bactero.model<-lm(Bactero.rel_abund.sum ~ time_point + lake + phy_group, data = bacteroidia)
Anova(Bactero.model, type=2)
## Anova Table (Type II tests)
## 
## Response: Bactero.rel_abund.sum
##            Sum Sq  Df F value    Pr(>F)    
## time_point 0.4021   4  3.0454   0.01815 *  
## lake       1.2648   5  7.6643 1.239e-06 ***
## phy_group  0.7171   2 10.8635 3.255e-05 ***
## Residuals  6.8651 208                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(allEffects(Bactero.model)) # time marginal signif, lake and phy strong

# separate sample types
bact.wat<-bacteroidia[(bacteroidia$sample_type=="Water"),]
bact.zoop<-bacteroidia[(bacteroidia$sample_type=="Zooplankton"),]

### Water first
# testing environmental effects on alphaproteobacteria abundances
FitAll.Bactero.wat<- lm(Bactero.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + chla_ug.L + 
                         pH + DO_perc + cond, data = bact.wat)
stepAIC(FitAll.Bactero.wat, direction = "both") # DO_perc + elevation_m  best
## Start:  AIC=-132.58
## Bactero.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + 
##     chla_ug.L + pH + DO_perc + cond
## 
##               Df Sum of Sq     RSS     AIC
## - pH           1  0.000248 0.12954 -134.53
## - chla_ug.L    1  0.002490 0.13178 -134.05
## - temp_C       1  0.002859 0.13215 -133.97
## - log(DOC)     1  0.006123 0.13541 -133.29
## <none>                     0.12929 -132.58
## - cond         1  0.015053 0.14434 -131.50
## - elevation_m  1  0.015489 0.14478 -131.41
## - latitude     1  0.041192 0.17048 -126.84
## - DO_perc      1  0.042928 0.17222 -126.55
## 
## Step:  AIC=-134.53
## Bactero.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + 
##     chla_ug.L + DO_perc + cond
## 
##               Df Sum of Sq     RSS     AIC
## - chla_ug.L    1  0.002341 0.13188 -136.03
## - temp_C       1  0.003229 0.13277 -135.84
## - log(DOC)     1  0.005907 0.13545 -135.28
## <none>                     0.12954 -134.53
## - cond         1  0.014897 0.14444 -133.48
## - elevation_m  1  0.015285 0.14482 -133.40
## + pH           1  0.000248 0.12929 -132.58
## - latitude     1  0.041004 0.17054 -128.83
## - DO_perc      1  0.043618 0.17316 -128.40
## 
## Step:  AIC=-136.03
## Bactero.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + 
##     DO_perc + cond
## 
##               Df Sum of Sq     RSS     AIC
## - temp_C       1  0.002076 0.13395 -137.59
## - log(DOC)     1  0.004513 0.13639 -137.08
## <none>                     0.13188 -136.03
## - elevation_m  1  0.014902 0.14678 -135.03
## - cond         1  0.017081 0.14896 -134.62
## + chla_ug.L    1  0.002341 0.12954 -134.53
## + pH           1  0.000098 0.13178 -134.05
## - latitude     1  0.038960 0.17084 -130.78
## - DO_perc      1  0.041279 0.17316 -130.40
## 
## Step:  AIC=-137.59
## Bactero.rel_abund.sum ~ latitude + elevation_m + log(DOC) + DO_perc + 
##     cond
## 
##               Df Sum of Sq     RSS     AIC
## - log(DOC)     1  0.006834 0.14079 -138.19
## <none>                     0.13395 -137.59
## - elevation_m  1  0.014002 0.14796 -136.81
## - cond         1  0.016879 0.15083 -136.27
## + temp_C       1  0.002076 0.13188 -136.03
## + chla_ug.L    1  0.001189 0.13277 -135.84
## + pH           1  0.000476 0.13348 -135.69
## - latitude     1  0.038534 0.17249 -132.51
## - DO_perc      1  0.045888 0.17984 -131.34
## 
## Step:  AIC=-138.2
## Bactero.rel_abund.sum ~ latitude + elevation_m + DO_perc + cond
## 
##               Df Sum of Sq     RSS     AIC
## <none>                     0.14079 -138.19
## + log(DOC)     1  0.006834 0.13395 -137.59
## + temp_C       1  0.004398 0.13639 -137.08
## + pH           1  0.001787 0.13900 -136.55
## + chla_ug.L    1  0.000054 0.14074 -136.21
## - elevation_m  1  0.028142 0.16893 -135.09
## - cond         1  0.035630 0.17642 -133.88
## - DO_perc      1  0.053131 0.19392 -131.23
## - latitude     1  0.059926 0.20072 -130.27
## 
## Call:
## lm(formula = Bactero.rel_abund.sum ~ latitude + elevation_m + 
##     DO_perc + cond, data = bact.wat)
## 
## Coefficients:
## (Intercept)     latitude  elevation_m      DO_perc         cond  
##  -1.688e+01    4.670e-01   -4.257e-04    1.405e-02   -3.352e-03

# best model
Best.bactero.wat<-lm(Bactero.rel_abund.sum ~ latitude + elevation_m + DO_perc + cond, data = bact.wat)
summary(Best.bactero.wat)
## 
## Call:
## lm(formula = Bactero.rel_abund.sum ~ latitude + elevation_m + 
##     DO_perc + cond, data = bact.wat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.127598 -0.049613 -0.007861  0.040771  0.153606 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -1.688e+01  4.862e+00  -3.473  0.00206 **
## latitude     4.670e-01  1.493e-01   3.129  0.00471 **
## elevation_m -4.257e-04  1.985e-04  -2.144  0.04282 * 
## DO_perc      1.405e-02  4.768e-03   2.946  0.00725 **
## cond        -3.352e-03  1.389e-03  -2.413  0.02421 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07824 on 23 degrees of freedom
## Multiple R-squared:  0.7309, Adjusted R-squared:  0.6841 
## F-statistic: 15.62 on 4 and 23 DF,  p-value: 2.611e-06
Anova(Best.bactero.wat, type=2) # DO signif, increases
## Anova Table (Type II tests)
## 
## Response: Bactero.rel_abund.sum
##               Sum Sq Df F value   Pr(>F)   
## latitude    0.059926  1  9.7898 0.004713 **
## elevation_m 0.028142  1  4.5973 0.042821 * 
## DO_perc     0.053131  1  8.6798 0.007249 **
## cond        0.035630  1  5.8207 0.024212 * 
## Residuals   0.140789 23                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(allEffects(Best.bactero.wat))

# water DO
summary(lm(Bactero.rel_abund.sum ~ DO_perc, data = bact.wat)) #R2=0.561, p<0.001
## 
## Call:
## lm(formula = Bactero.rel_abund.sum ~ DO_perc, data = bact.wat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.12292 -0.05775 -0.01318  0.02832  0.25044 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.199815   0.254094  -4.722 7.00e-05 ***
## DO_perc      0.020127   0.003378   5.958 2.75e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09224 on 26 degrees of freedom
## Multiple R-squared:  0.5772, Adjusted R-squared:  0.5609 
## F-statistic: 35.49 on 1 and 26 DF,  p-value: 2.746e-06

# water latitude
summary(lm(Bactero.rel_abund.sum ~ latitude, data = bact.wat)) #R2=0.449, p<0.001
## 
## Call:
## lm(formula = Bactero.rel_abund.sum ~ latitude, data = bact.wat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.152695 -0.072496 -0.001836  0.041956  0.227238 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -12.07708    2.58485  -4.672 7.98e-05 ***
## latitude      0.32817    0.06848   4.792 5.81e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1034 on 26 degrees of freedom
## Multiple R-squared:  0.469,  Adjusted R-squared:  0.4486 
## F-statistic: 22.97 on 1 and 26 DF,  p-value: 5.812e-05

# water conductivity
summary(lm(Bactero.rel_abund.sum ~ cond, data = bact.wat)) #R2=0.112, p=0.046
## 
## Call:
## lm(formula = Bactero.rel_abund.sum ~ cond, data = bact.wat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.25416 -0.07825 -0.02372  0.09471  0.28574 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.2119722  0.0440162   4.816 5.47e-05 ***
## cond        0.0013890  0.0005226   2.658   0.0133 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1258 on 26 degrees of freedom
## Multiple R-squared:  0.2136, Adjusted R-squared:  0.1834 
## F-statistic: 7.064 on 1 and 26 DF,  p-value: 0.01327

# water elevation
anova(lm(Bactero.rel_abund.sum ~ elevation_m, data = bact.wat)) #NS
## Analysis of Variance Table
## 
## Response: Bactero.rel_abund.sum
##             Df  Sum Sq  Mean Sq F value Pr(>F)
## elevation_m  1 0.01874 0.018742  0.9659 0.3348
## Residuals   26 0.50451 0.019404

### now zoops
FitAll.Bactero.zoop<- lm(Bactero.rel_abund.sum ~  latitude + elevation_m + temp_C + log(DOC) + chla_ug.L + 
                         pH + DO_perc + cond,  data = bact.zoop)
stepAIC(FitAll.Bactero.zoop, direction = "both") # latitude + elevation_m + temp_C + DO_perc + cond
## Start:  AIC=-616.08
## Bactero.rel_abund.sum ~ latitude + elevation_m + temp_C + log(DOC) + 
##     chla_ug.L + pH + DO_perc + cond
## 
##               Df Sum of Sq    RSS     AIC
## - log(DOC)     1   0.02219 7.0862 -617.47
## - pH           1   0.02382 7.0878 -617.43
## - chla_ug.L    1   0.04905 7.1130 -616.75
## <none>                     7.0640 -616.08
## - temp_C       1   0.18420 7.2482 -613.13
## - DO_perc      1   0.30919 7.3732 -609.85
## - cond         1   0.44801 7.5120 -606.27
## - latitude     1   0.80605 7.8700 -597.33
## - elevation_m  1   0.82538 7.8894 -596.86
## 
## Step:  AIC=-617.47
## Bactero.rel_abund.sum ~ latitude + elevation_m + temp_C + chla_ug.L + 
##     pH + DO_perc + cond
## 
##               Df Sum of Sq    RSS     AIC
## - pH           1   0.01497 7.1011 -619.07
## - chla_ug.L    1   0.03298 7.1192 -618.58
## <none>                     7.0862 -617.47
## + log(DOC)     1   0.02219 7.0640 -616.08
## - temp_C       1   0.18823 7.2744 -614.44
## - DO_perc      1   0.32042 7.4066 -610.98
## - cond         1   0.81400 7.9002 -598.60
## - latitude     1   0.98855 8.0747 -594.40
## - elevation_m  1   1.13044 8.2166 -591.06
## 
## Step:  AIC=-619.07
## Bactero.rel_abund.sum ~ latitude + elevation_m + temp_C + chla_ug.L + 
##     DO_perc + cond
## 
##               Df Sum of Sq    RSS     AIC
## - chla_ug.L    1   0.03300 7.1341 -620.18
## <none>                     7.1011 -619.07
## + pH           1   0.01497 7.0862 -617.47
## + log(DOC)     1   0.01334 7.0878 -617.43
## - temp_C       1   0.20375 7.3049 -615.64
## - DO_perc      1   0.30920 7.4103 -612.89
## - cond         1   0.80081 7.9020 -600.55
## - latitude     1   0.98898 8.0901 -596.04
## - elevation_m  1   1.12310 8.2242 -592.88
## 
## Step:  AIC=-620.18
## Bactero.rel_abund.sum ~ latitude + elevation_m + temp_C + DO_perc + 
##     cond
## 
##               Df Sum of Sq    RSS     AIC
## <none>                     7.1341 -620.18
## + chla_ug.L    1   0.03300 7.1011 -619.07
## + pH           1   0.01498 7.1192 -618.58
## + log(DOC)     1   0.00237 7.1318 -618.24
## - temp_C       1   0.17927 7.3134 -617.41
## - DO_perc      1   0.34577 7.4799 -613.09
## - latitude     1   0.96461 8.0988 -597.83
## - cond         1   0.99055 8.1247 -597.22
## - elevation_m  1   1.12018 8.2543 -594.18
## 
## Call:
## lm(formula = Bactero.rel_abund.sum ~ latitude + elevation_m + 
##     temp_C + DO_perc + cond, data = bact.zoop)
## 
## Coefficients:
## (Intercept)     latitude  elevation_m       temp_C      DO_perc         cond  
##  -1.926e+01    6.288e-01   -8.182e-04   -1.600e-02   -1.518e-02   -4.831e-03

# best model
Best.bactero.zoop<-lm(Bactero.rel_abund.sum ~ latitude + elevation_m + temp_C + DO_perc + cond, data = bact.zoop)
summary(Best.bactero.zoop)
## 
## Call:
## lm(formula = Bactero.rel_abund.sum ~ latitude + elevation_m + 
##     temp_C + DO_perc + cond, data = bact.zoop)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3184 -0.1515 -0.0228  0.1442  0.6109 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.926e+01  4.106e+00  -4.691 5.25e-06 ***
## latitude     6.288e-01  1.254e-01   5.015 1.23e-06 ***
## elevation_m -8.182e-04  1.514e-04  -5.404 1.98e-07 ***
## temp_C      -1.600e-02  7.402e-03  -2.162  0.03190 *  
## DO_perc     -1.518e-02  5.056e-03  -3.002  0.00305 ** 
## cond        -4.831e-03  9.507e-04  -5.082 9.06e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1958 on 186 degrees of freedom
## Multiple R-squared:  0.1596, Adjusted R-squared:  0.1371 
## F-statistic: 7.067 on 5 and 186 DF,  p-value: 4.502e-06
Anova(Best.bactero.zoop, type=2) # DOC signif
## Anova Table (Type II tests)
## 
## Response: Bactero.rel_abund.sum
##             Sum Sq  Df F value    Pr(>F)    
## latitude    0.9646   1 25.1490 1.233e-06 ***
## elevation_m 1.1202   1 29.2050 1.977e-07 ***
## temp_C      0.1793   1  4.6740  0.031900 *  
## DO_perc     0.3458   1  9.0148  0.003046 ** 
## cond        0.9906   1 25.8254 9.058e-07 ***
## Residuals   7.1341 186                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(allEffects(Best.bactero.zoop))

# zoop temp
summary(lm(Bactero.rel_abund.sum ~ temp_C, data = bact.zoop)) #0.030, R2=0.019
## 
## Call:
## lm(formula = Bactero.rel_abund.sum ~ temp_C, data = bact.zoop)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.30632 -0.17538 -0.03588  0.15878  0.60959 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.579472   0.119286   4.858 2.47e-06 ***
## temp_C      -0.015329   0.007004  -2.189   0.0298 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2088 on 190 degrees of freedom
## Multiple R-squared:  0.02459,    Adjusted R-squared:  0.01946 
## F-statistic:  4.79 on 1 and 190 DF,  p-value: 0.02984

# zoop DO
summary(lm(Bactero.rel_abund.sum ~ DO_perc, data = bact.zoop)) #NS
## 
## Call:
## lm(formula = Bactero.rel_abund.sum ~ DO_perc, data = bact.zoop)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.26881 -0.18064 -0.03616  0.14762  0.61866 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.211426   0.266674   0.793    0.429
## DO_perc     0.001444   0.003525   0.410    0.683
## 
## Residual standard error: 0.2113 on 190 degrees of freedom
## Multiple R-squared:  0.0008824,  Adjusted R-squared:  -0.004376 
## F-statistic: 0.1678 on 1 and 190 DF,  p-value: 0.6825

# zoop latitude
summary(lm(Bactero.rel_abund.sum ~ latitude, data = bact.zoop)) #NS
## 
## Call:
## lm(formula = Bactero.rel_abund.sum ~ latitude, data = bact.zoop)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27062 -0.18507 -0.03605  0.15056  0.63032 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.59118    2.16643  -0.734    0.464
## latitude     0.05051    0.05724   0.882    0.379
## 
## Residual standard error: 0.2109 on 190 degrees of freedom
## Multiple R-squared:  0.004082,   Adjusted R-squared:  -0.00116 
## F-statistic: 0.7787 on 1 and 190 DF,  p-value: 0.3787

# zoop elevation
Anova(lm(Bactero.rel_abund.sum ~ elevation_m, data = bact.zoop)) #NS
## Anova Table (Type II tests)
## 
## Response: Bactero.rel_abund.sum
##             Sum Sq  Df F value Pr(>F)
## elevation_m 0.0127   1   0.284 0.5947
## Residuals   8.4768 190

# zoop DO
summary(lm(Bactero.rel_abund.sum ~ cond, data = bact.zoop)) #NS
## 
## Call:
## lm(formula = Bactero.rel_abund.sum ~ cond, data = bact.zoop)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.28590 -0.17962 -0.03328  0.15141  0.59098 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.3438007  0.0346141   9.932   <2e-16 ***
## cond        -0.0002812  0.0003749  -0.750    0.454    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2111 on 190 degrees of freedom
## Multiple R-squared:  0.002952,   Adjusted R-squared:  -0.002296 
## F-statistic: 0.5625 on 1 and 190 DF,  p-value: 0.4542

# zoop DOC
summary(lm(Bactero.rel_abund.sum ~ log(DOC), data = bact.zoop)) #p=0.003, R2=0.04
## 
## Call:
## lm(formula = Bactero.rel_abund.sum ~ log(DOC), data = bact.zoop)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.29401 -0.16161 -0.03959  0.14378  0.61140 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.35091    0.01802  19.468  < 2e-16 ***
## log(DOC)    -0.02942    0.00980  -3.002  0.00304 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2065 on 190 degrees of freedom
## Multiple R-squared:  0.04528,    Adjusted R-squared:  0.04026 
## F-statistic: 9.012 on 1 and 190 DF,  p-value: 0.003043

# plot it
bactero_DO <- ggplot(bacteroidia, aes(x=DO_perc, y=Bactero.rel_abund.sum, color=sample_type, fill=sample_type)) +
  geom_smooth(method = lm, alpha=0.15)+
  geom_point(size=1.5, alpha=0.6)+
  scale_color_manual(values = c("dodgerblue", "coral"))+
  scale_fill_manual(values = c("dodgerblue3", "firebrick"))+
  ylim(c(0,1))+
  xlab("DO (%)") + ylab("Relative Abundance") +
  theme(axis.text.x=element_text(size=14)) + theme(axis.title.x=element_text(size=16)) +
  theme(axis.text.y=element_text(size=14)) + theme(axis.title.y=element_text(size=16)) + 
  ggtitle("Bacteroidia") + theme_classic() +
  annotate(geom="text", x=70, y=1.0, label="Water Adj-R2=0.561, p<0.001", size=2,
           color="black")
bactero_DO 

# DOC importnat (sorta) for zoops
bactero_DOC <- ggplot(bacteroidia, aes(x=log(DOC), y=Bactero.rel_abund.sum, color=sample_type, fill=sample_type)) +
  geom_smooth(method = lm, alpha=0.15)+
  geom_point(size=1.5, alpha=0.6)+
  scale_color_manual(values = c("dodgerblue", "coral"))+
  scale_fill_manual(values = c("dodgerblue3", "firebrick"))+
  ylim(c(0,1))+
  xlab("log(Dissolved organic carbon) (mg/L)") + ylab("Relative Abundance") +
  theme(axis.text.x=element_text(size=14)) + theme(axis.title.x=element_text(size=16)) +
  theme(axis.text.y=element_text(size=14)) + theme(axis.title.y=element_text(size=16)) + 
  ggtitle("Bacteroidia") + theme_classic() +
  annotate(geom="text", x=2, y=1.0, label="Zoop Adj-R2=0.040, p=0.003", size=2,
           color="black") +
  annotate(geom="text", x=2, y=0.7, label="Water Adj-R2=0.112, p=0.046", size=2,
           color="black")
bactero_DOC

# temp, signif for zoop (but poor R2)
bactero_temp<-ggplot(bacteroidia, aes(x=temp_C, y=Bactero.rel_abund.sum, 
                                      color=sample_type, fill=sample_type)) +
  geom_smooth(method = lm, alpha=0.2)+
  geom_point(size=1.5, alpha=0.6)+
  scale_color_manual(values = c("dodgerblue", "coral"))+
  scale_fill_manual(values = c("dodgerblue3", "firebrick"))+
  ylim(c(0,1))+
  xlab("Temperature (C)") + ylab("Relative Abundance") +
  theme(axis.text.x=element_text(size=14)) + theme(axis.title.x=element_text(size=16)) +
  theme(axis.text.y=element_text(size=14)) + theme(axis.title.y=element_text(size=16)) + 
  ggtitle("Bacteroidia") + theme_classic() +
  annotate(geom="text", x=20, y=1.0, label="Zoop Adj-R2=0.019, p=0.030", size=2,
           color="black")

# DO, signif for zoop (but poor R2)
bactero_lat<-ggplot(bacteroidia, aes(x=latitude, y=Bactero.rel_abund.sum, 
                                      color=sample_type, fill=sample_type)) +
  geom_smooth(method = lm, alpha=0.2)+
  geom_point(size=1.5, alpha=0.6)+
  scale_color_manual(values = c("dodgerblue", "coral"))+
  scale_fill_manual(values = c("dodgerblue3", "firebrick"))+
  ylim(c(0,1))+
  xlab("Latitude (o N)") + ylab("Relative Abundance") +
  theme(axis.text.x=element_text(size=14)) + theme(axis.title.x=element_text(size=16)) +
  theme(axis.text.y=element_text(size=14)) + theme(axis.title.y=element_text(size=16)) + 
  ggtitle("Bacteroidia") + theme_classic() +
  annotate(geom="text", x=37.6, y=1.0, label="Water Adj-R2=0.449, p<0.001", size=2,
           color="black")
bactero_lat

Pool plots of relative abundance x environment and combine with similar analyses of Shannon diversity and environmental predictors.

#### pool the plots (add in shannon from above)
multregres.plots.shannon.env.wat.zoop<-plot_grid(plot_grid(lat.shan.plot.zoop + guides(color="none", fill="none"),
                                             DO.shan.plot.zoop + guides(color="none", fill="none"),
                                             cond.shan.plot.zoop + guides(color="none", fill="none"),
                                             extract.legend.Shan,
                            ncol=3, nrow=1, labels=c("A", "B", "C", ""), rel_widths=c(5,5,5,0.5)))

Taxa.relabund.env.regr<-plot_grid(gamma_lat + guides(color="none", fill="none"), 
                                  alpha_elev + guides(color="none", fill="none"), 
                                  actino_elev + guides(color="none", fill="none"), 
                                  bactero_lat + guides(color="none", fill="none"), 
                                  
                                  gamma_cond + guides(color="none", fill="none"), 
                                  alpha_cond + guides(color="none", fill="none"), 
                                  actino_cond + guides(color="none", fill="none"), 
                                  bactero_DO + guides(color="none", fill="none"), ncol=4,
                                  labels=c("D", "E", "F", "G", "H", "I", "J", "K"))

######### combine shannon with the taxa-specific
plot_grid(multregres.plots.shannon.env.wat.zoop, Taxa.relabund.env.regr, ncol=1, nrow=2, rel_heights = c(0.8,2))
**Figure 5**. Plots of spatial and environmental variables with strongest correlations with microbial communities in water bacterioplankton and zooplankton hosts (see Table S6). (A-C) Correlations between Shannon diversity (top row) and latitude, dissolved oxygen, and conductivity. (D-K) Correlations between the relative abundance of bacteria classes (Gammaproteobacteria, Alphaproteobacteria, Actinobacteria, Bacteroidia) and spatial (middle row) and environmental predictors (bottom row).

Figure 5. Plots of spatial and environmental variables with strongest correlations with microbial communities in water bacterioplankton and zooplankton hosts (see Table S6). (A-C) Correlations between Shannon diversity (top row) and latitude, dissolved oxygen, and conductivity. (D-K) Correlations between the relative abundance of bacteria classes (Gammaproteobacteria, Alphaproteobacteria, Actinobacteria, Bacteroidia) and spatial (middle row) and environmental predictors (bottom row).

dev.copy(pdf, "figures/Fig5.Shann.env.Taxa.relabund.regr_rev.pdf", width=12, height=10)
dev.off()